Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 6: Working with Time Series - Air Quality over Time
Lecture: Rename columns in Pandas to Remove Invalid Characters

Login or purchase this course to watch this video and the rest of the course contents.
0:00 I'm going to rename my columns as well. You can see these columns have parentheses in them.
0:05 This is just going to make them kind of a pain to work with. They also have periods.
0:09 So I'm going to use columns that are a little bit more descriptive. We'll just use rename here to rename the columns,
0:17 and then we'll check the columns to see if that works. We still have those unnamed columns in there. Let's look at the unnamed column value counts.
0:25 There's nothing in that one, and there's nothing in that one. So I'm going to update my chain here, and I'm
0:31 going to explicitly specify the columns that I want, excluding those unnamed columns. Why do I explicitly specify the columns? Why don't I say drop?
0:40 Because I want to have my code focus on the columns and the data that I want, not the data that I don't want.
0:47 I want someone who comes to this code to look at it and say, OK, these are the columns that are in there.
0:51 Not be worried about what columns are in there and know what columns aren't in there. You can see that it's easier to drop the columns,
0:59 but it's actually not as good for the person who is using the code to have just the columns that are dropped.
1:06 In this section, we're going to dive into some of our other columns and make sure that the types are correct. Let's look at the string columns here.
1:15 Here are the string columns, and if you look at these, these don't really look like strings. They look more like they are numbers.
1:22 So what are we going to do? It looks like there's commas in there. And those commas probably confused pandas such that it made them strings.
1:34 OK, so we've got carbon monoxide, benzene, temp, relative humidity, and absolute humidity. Let's go through and see if we can clean that up.
1:42 You can see that I've added this pipe down here. Remember, pipe is a method. You can pass in any function you want into it,
1:48 and this is really flexible. So let's see how we're using pipe here. So the pipe is basically letting us use a sign.
1:56 And why didn't I use a sign directly? I didn't use a sign directly because I have these weird column names,
2:02 and I don't want to refer to those original weird column names. I want to refer to these nice clean ones.
2:08 If I use pipe here, I get the current state of the data frame with the cleaned up column names when I refer to it.
2:15 So this is a way to let me quickly refer to the updated column names. What are we going to do with those?
2:20 We're going to replace the commas with periods, and then we're going to cast those to floats. Let's run that and make sure that it works.
2:28 It looks like that did work. Let's inspect the types of that, and it looks like our types worked.


Talk Python's Mastodon Michael Kennedy's Mastodon