Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 6: Working with Time Series - Air Quality over Time
Lecture: Making Line Plots for Time Series Data in Pandas

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In this next section, I'm going to show you how to do some plotting. So here's our cleaned up data. I'm going to make this date into Italian time.
0:08 So when I do that, you should see this offset here. I'm going to comment this out and just let you see the difference here.
0:15 You can see that this is our naive time. This is our Italian time. What I'm going to do next, I'm going to stick this into the index.
0:22 So here is the index. It's just the default index. Now you can see that our Italian time went in there. And let's pull off some columns here.
0:32 So I'm using loc. Loc is interesting for a couple reasons. One is that we don't call it with parentheses, like set index.
0:38 We're calling that with parentheses. Loc, we have an index operation on it. The other thing that's interesting about this
0:43 is that you can pass in two things to index on, which is kind of weird. You don't index on two things in lists or dictionaries.
0:50 But in this, you can pass in a row selector. In this case, we're passing in just a colon, which is a slice, which means take all of the rows.
0:57 And then these are the columns. These are the column selectors. And we're going to take those two columns.
1:02 I'm going to convert this to our PyArrow time zone, and we'll just double check that it works with that as well. It looks like that did work.
1:14 Okay, let's plot this. So again, let's go back to what we had before here. Here is our code. We've pulled out these two columns.
1:21 We're going to do a plot. Plotting is relatively easy in Pandas once you figure out what's going on. When you just call plot,
1:27 what it's going to do is it's going to stick the index in the x-axis, and it's going to take each column and plot that as a line. So here we go.
1:36 We've got two columns, so we should see two lines. Our index is a date, so it should stick the date in the x-axis. There we go. There is our plot.
1:44 Now, is this the world's greatest plot? No, not necessarily. It looks like it's got a lot of negative 200s going on there.
1:51 And this is one of the reasons I like plotting, not because this is a great plot per se, but it shows me some of the issues in this data
1:59 that pop out really clearly when I plot it. If I were to just look at a table of this data, it would be a little bit harder for me
2:06 to see this information. Okay, I'm going to change my location here. So here's what my location was before. I was saying take all the rows.
2:14 Let's just zoom in a little bit. Let's zoom in through April and May and see what's going on there. So there's April and May.
2:22 You can see, again, it's dropping off to this negative 200 value. What's going on with that 200 value? Again, I would go to a subject matter expert
2:30 and ask them, I'm going to replace negative 200 because that probably doesn't make sense for a negative value to be in there.
2:37 I'm going to replace it with a missing value. So that's what we get when we replace that with a missing value. That's looking a little bit better.
2:43 It's still a little bit hard to see, but it's a little bit clearer there. If I want to zoom in a little bit more,
2:52 let's say I want to go in to April 8th through April 13th, I can do this and get something that looks like this.
3:01 This illustrates one of the cool things about Pandas. When you stick a date into the index, you can use that loc and you can use substrings
3:10 and it will pull off portions of the date, which is really flexible and nice. In this section, I showed you how to do a plot.
3:17 Basically, if you want to do a plot of time series data, stick the date into the index and every column will be aligned.


Talk Python's Mastodon Michael Kennedy's Mastodon