Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 4: Understanding Grouping and Aggregation Retail Data
Lecture: Using Grouper in Pandas to Groupby by Month Frequency

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Your boss is liking all this stuff that you're able to give them now. They're asking for sales by month Okay
0:07 so here is our original data and we're gonna say let's make that total column and I'm gonna show you a new way to
0:13 Get that value here the month without actually making a column
0:18 What I'm gonna do is I'm gonna say PD grouper and we're say I want to group by this invoice date But look at this I say freak is equal to M
0:27 so what this is going to do is It's going to complain. It's gonna say nope. I can't do that
0:33 So this is only valid with date time index, but got a instance of an index
0:40 So this is this is actually a pandas to bug. Hopefully they'll be fixing this soon
0:43 So I'm gonna look at the types of cells here and you can see that we have the date. It's this time stamp
0:51 It's PI arrow time. I'm gonna actually convert this to a pandas time so let's do that here I'm gonna say as type and
1:00 If you look at this, we don't really see much difference here, but if we look at the D types now of this You can see this as date time and that's 64
1:14 So hopefully in in a soon released version of pandas this will be fixed
1:19 but I'm gonna add that total column there and then we're gonna group by and I'm gonna say
1:25 that column which is the date column that we just changed to a
1:28 NumPy date now and this freak here M is the month frequency. So let's do that. That's gonna be lazy
1:36 It's gonna give us that group by object and then we're gonna say I want to summarize the numeric columns there now look at the index
1:41 Here instead of having a month here. What we have is the end of each month. So this is really cool With relatively little code again
1:52 I did have to change the type because PI arrow doesn't support that I was able to summarize by month
2:00 I'm just going to look at the memory usage of our old data here. There's not a
2:07 Difference in memory usage. It's just that one's using NumPy and the other one's using PI arrow
2:13 Okay, so one things I like to do with this once I have that So here here's what we had. Let's throw on a plot here to visualize that
2:24 So look at this this is going to do a line plot we haven't seen line plots yet
2:31 But here's our data. This is a series when we do a just plot by default. It's going to do a line plot
2:37 It's going to put the index in the x-axis in this case The index is dates and then it's going to draw a line plot for those values there
2:45 So really easy to make a line plot in pandas
2:48 Just called dot plot there. This makes it really clear that in November. That's where we have the most cells
2:55 But this is aggregated at the month level, but watch this I can change this freak value here from M to W and
3:03 Now we are looking at an aggregation of the weekly values In fact, I can change it to a D and we can aggregate at the day value, which is kind of cool
3:14 I can do a 3d here and aggregate at every three day value So pandas makes it really flexible to aggregate at different date intervals
3:24 We call this the offset alias and we're going to use that PD grouper
3:28 So the PD grouper syntax is a little weird, but once you get used to it, it's able to do very powerful things


Talk Python's Mastodon Michael Kennedy's Mastodon