Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 6: Working with Time Series - Air Quality over Time
Lecture: Resampling Time Series Data in Pandas with resample
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
In this section we're going to explore resampling. Here's our interpolated chart from before.
0:06
Now if I pull off this plot, let's look at what's in the index here. And it looks like this is every hour. If I want to change the frequency of that,
0:14
it's really easy if I have a date in the index. I just call resample. So I'm going to say resample to the two-hour frequency. Now this is lazy.
0:24
Resample doesn't return anything. To get to return anything, I need to do an aggregation here. So I'm going to do the mean as the aggregation.
0:30
When you look at this now, now we have every two hours. So if we were to tack on a plot to this,
0:38
I'll just take this code up here and stick this on down here. You can see that that smooths that out.
0:49
I could come in here and say every three hours if I wanted to, or every five hours.
0:54
Really convenient that we can just stick those numbers in front of that. In fact, we can do something like this.
0:59
We can say every two hours and 37 minutes. And you can see that this is two hours and 37 minutes past that, and that pattern continues.
1:07
So here I'll just stick a plot onto that. I can also resample to the day level. Here's every day. And we can do five days, six days, etc.
1:19
Instead of doing resample, another thing that I can use is use what's called a grouper. The code here is a little bit different.
1:27
I'm commenting out our old code so you can see what I changed here. I'm not setting the index. We're keeping the same index.
1:33
So our look is going to be a little bit different. I'm going to pull off these columns. In this case, I'm including the date column.
1:39
Then I'm going to cast my column types. I commented out that float because I can't say as type float because I have a date in there.
1:48
I'm going to do my interpolation. Now instead of doing resample, I'm going to do group by, but I'm going to do pd grouper.
1:54
The nice thing about using grouper instead of resample is that I can actually use group by with multiple groupings.
2:02
In this case, I'm only using one grouping, but I could do a hierarchical grouping if I want to. I'm saying take the date column and do a day frequency
2:10
and then the aggregation there is a mean. At this point, I'm going to do my loque, which is similar to what I had up above there,
2:16
and then I'll do a plot. In this section, we explored resampling. Super powerful thing that you can do with pandas.
2:24
We have those offset aliases that allow us to very quickly change the resample frequency.