#
Move from Excel to Python with Pandas Transcripts

Chapter: Aggregating, grouping, and merging

Lecture: Examples of aggregation in pandas

Login or
purchase this course
to watch this video and the rest of the course contents.

0:00
Now let's go through some examples of how to aggregate in group data in Panis.

0:05
So Ivory started this notebook to read in my sample sales data and take a look

0:11
at the top five rows, the data like we have been doing in the past

0:15
So now let's try and do a simple aggregation of the Price column.

0:23
If we use the ag function and call that on the price calm,

0:28
we can pass in a list of one or more functions to apply to that column

0:33
and get the results. So in this example,

0:35
I did the mean function, which tells us that we have an average price of

0:39
22.8. I can also do something like dinner deviation,

0:44
or I can do men Max,

0:47
and there are many different aggregations you can apply.

0:50
But this basic feature that we use on a column we can also use when we

0:56
group data, which I will show in a second.

0:59
One interesting aspect about aggregations is that you can run it on an entire data frame

1:05
as well, so we can say and pendants will then run a mean in a

1:14
max on all of the columns that it can,

1:17
and so you'll notice that it will run the functions on day columns,

1:22
object columns, numeric columns and in those places where it can actually do a compass

1:27
Correct calculation. It returns this in a N,

1:31
which means not a number function.

1:33
We can define a dictionary that has the key as each column name and then a

1:38
list of all the aggregation functions we want to apply on those columns.

1:42
So let's go ahead. And to that,

1:43
just a show what I'm talking about.

1:47
So if we want to get the average quantity,

1:51
or maybe let's just let's do the total 20 and then for price,

2:00
we can pass our standard deviation in the mean we can all student invoice and maybe

2:06
for this one I want to count.

2:10
So this dictionary says for each of these columns,

2:13
we want to perform these operations on those columns,

2:18
and then if we want to actually apply it,

2:20
we two DF AG and add the AG columns to that so you can see here

2:26
that it went through each column and it did various functions for each of the calm

2:32
So I said, Let's get us some of the quantity,

2:36
the mean and standard deviation price,

2:38
the count of the invoice and the some of the extended amount.

2:42
And we have those in a in values there because pandas is constructing a single table

2:47
or data frame and doesn't know what fill it in.

2:50
So it will just place the in A in there.

2:53
If you want to fix that so that you get a zero,

2:56
we use fill in A. And that function will fill all of your in a

3:01
N values with zero so that we've talked about aggregating.

3:05
We're gonna talk about grouping because they really go together.

3:09
While you can aggregate on a data frame by itself most of the time you're going

3:13
to group first. So let's do a quick example.

3:16
Room in group. Buy everything in the product column,

3:19
and then we tell it the some.

3:20
So what this does is a groups,

3:23
all the products. So we have four products that has grouped together,

3:26
and it's summed all the numerical.

3:29
Um, so we know the total quantity,

3:31
the total price and the extended amount.

3:33
But you think about this. The some of the price really doesn't make a whole

3:38
lot of sense. So maybe what we really want to dio is just get the

3:42
quantity was copied. That and what we can do is tell it.

3:49
So now we've said group byproduct,

3:52
but just run the some operation on the quantity column,

3:56
and this functionality works. But it's not the preferred way that I like to do

4:01
aggregations in pandas. And there's the structure that makes the most sense for dealing with

4:08
more complex grouping and aggregations. So I define a prod calls.

4:13
So I say in quantity column what some that.

4:17
So if you followed what I did on the aggregation,

4:19
I'm going down that same path.

4:21
So now I say, group by product and now I ag broad calls.

4:28
So, combining what we've talked about earlier with aggregations and group buys,

4:33
we defined the columns that we want to perform mathematical operations on.

4:37
In this case, we want to some all the values in the quantity.

4:41
We want a group by product and therefore we have be counts or our books,

4:46
pins and posters, which, if you recall,

4:49
is very similar to what we did with be some ifs function in excel.

4:54
So the groupie and aggregation that you've been doing an Excel.

4:57
This is just a different way of doing it,

5:00
and it's really more powerful because you have a lot of flexibility and a very small

5:05
subset of code to do at a lot of summary information and grouping on your data

5:10
For instance, copy and paste this,

5:13
but let's say we don't want just do the the some.

5:18
Maybe we also want to see what the average size of each transaction is.

5:23
This starts toe give you a feel for how quickly you can iterated through analysing your

5:28
data. I just adding in the mean function,

5:31
you start to get mawr information about what the purchase patterns look like,

5:36
and we can go back in and out of state or deviation.

5:40
Um, maybe we want to see what's the max.

5:43
Someone is purchased, and this gives us a really quick way to reiterate through our

5:47
data and keep looping back and making adjustments and figuring out what insights we can get

5:54
from our data. And we can also do our group by on multiple columns.

6:00
So right now, up until now,

6:02
we just grew by product. But what if we want to see company in products

6:08
Now we're gonna get more results.

6:10
So we're gonna see for each company what products they purchased.

6:13
The total amount of the average Mount Sandy eight deviation,

6:17
the max in the men. And once again,

6:18
we have our Entei ends. And what that's telling us is there are some negative

6:23
values or some other values in there that are causing the standard deviation to not calculate

6:29
correctly. So if we want to,

6:32
if we don't like, be in ends,

6:33
we can fill in a was zero to turn those into zeros.

6:37
And one of the things you likely noticed is that once we do these aggregations,

6:45
our indices and our columns look a little bit different.

6:49
And what what Pandas has done is put together a multi index force.

6:55
I'm not gonna go through the details of what a multi indexes or why you would

6:59
want to use it. But I am going to introduce this concept of doing reset

7:04
index. And once you call that you can see now that instead of having company

7:11
and then product and book pin poster group together,

7:15
it's reset so that there's one row for each value.

7:20
And I highlight that because sometimes some of the future summaries and analyses that you want

7:25
to do are much easier when you've reset the index this way.

7:29
So I want to introduce that concept to you.

7:31
Now, for the final example,

7:34
we're gonna introduce a concept called named Aggregations.

7:37
So probably the best way is just Ah,

7:39
walk through this and then I'll explain what I'm doing.

7:57
Okay, so let me walk through this what we've done with our named aggregation here

8:01
So we want to count the number of invoices that each company company has.

8:07
And then what is there? Max Purchase?

8:10
And instead of just leaving the columns to the default naming convention based on the extended

8:18
mountain invoice actually defined the column name.

8:21
So this says that the invoice total should be the name of the column,

8:24
and Max purchase should be the name of the call.

8:27
Now, this is really helpful as you start developing these aggregations and want to present

8:33
the results. Other people you want to make sure that the column names are consistent