Move from Excel to Python with Pandas Transcripts
Chapter: Case study walk through: Sales commissions
Lecture: Cleaned, joined file
0:00 Now that we have the company associated with the sales rap,
0:03 we could start to model. The commission's in a little bit more detail,
0:08 so we're now adding a column that will show how much commission is calculated for each
0:14 customer. So let's take a look Scroll over here now and we can see a
0:18 commission, and what I want to do here is clean this up.
0:23 I'm going around it a little bit.
0:24 This is going to introduce some rounding errors,
0:27 but I think it's Ah, it makes sense in this case that the two decimal
0:31 points now that makes it a little more straightforward.
0:36 And we can always check to make sure that we're still in a million dollars in
0:40 commissions. They were right at a million dollars.
0:45 We contained to have some around the heirs.
0:47 But that's not an issue at this point.
0:49 Now that we have all this data,
0:51 we can start to aggregate it and get some insights into how the commissions are distributed
0:56 For instance, what if we want to know how much we're paying by region
1:04 starts to give us a little more insight into what our plan might be telling us
1:08 so We've got a pretty broad distribution of commissions just based on region,
1:14 and we can do some other quick analysis on this if we want to.
1:18 Maybe not just looking at the some,
1:23 but we also want to see the average distribution.
1:26 So once again there's there's a little bit of spread here between the sales and the
1:32 different regions that we may want to investigate some more.
1:34 One of the things that may be worth investigating is how we're commissioning by channel.
1:41 Let's add that into our group by I'm also going around it just to make it
1:46 a little easier toe interpret. So looking at this data,
1:50 there continues to be, you know,
1:51 some interesting differences with the reseller versus retail that we probably want to dive into a
1:58 little bit more. So let's try another one this example,
2:03 and continue to copy and paste.
2:06 Just take a look at the channel.
2:08 It's interesting that there's a lot more commissions for reseller in retail,
2:14 and you know what would probably be interesting is to see how Maney companies actually fall
2:20 in these various buckets. So let's have that as well,
2:28 and I used the count function to count how many company names this ties back into
2:33 what we observed earlier, that we have a lot of retailers,
2:37 number of resellers and only two partners,
2:39 and it just looks like the commission rates may not be equally distributed across those various
2:45 groups. And I think the way toe best see this data is to use a
2:49 pivot table. So let's build one.
3:16 Now, as we look at the pivot table broken out by region and the different
3:20 customer channels, we start to see that there is some unequal distribution in the way
3:25 partners air distributed across geography ease as well as the total amount in the different geography
3:33 ease. For instance, the South and the Midwest have 794,000 versus a little over
3:41 100,000 in the Northeast in the West.
3:43 So this distribution might not be what we would want to see with our commission's plan
3:48 We're probably going to need to do some more analysis on it,
3:53 but this customer rep data frame that we put together is really gonna be useful for
3:57 the next step in analysis. So I'm going to save this.
4:02 It looks like I forgot to define output file.
4:05 So let's go ahead and do that up at the top so that we keep everything
4:08 in the same place. So I'm going to store this file in the process directory
4:19 because I'm going to need to use this for some additional analysis.
4:23 So I copied that. You have it down here and now we save that file
4:28 and let me open it up and show you what it looks like.