Move from Excel to Python with Pandas Transcripts
Chapter: Case study walk through: Sales commissions
Lecture: Cleaning up region
0:00 the regions to find here are official regions that the U.
0:04 S. Census Bureau defines which states go into it.
0:07 Which region? And if you do a search on Wikipedia,
0:10 you confined it. But it's not very easy to digest into a pandas data frame
0:14 And instead of typing at all,
0:16 I did a Google search and found this repository that has each state,
0:21 the state code, the region and the division.
0:25 So this is going to save us a lot of time,
0:28 and one of the things that pandas can do is actually read in this file.
0:32 So let's go ahead and define what that you are.
0:35 L Is this a little trick with reading?
0:39 See SV files from Get Hub put past raw equals True.
0:43 So that gives us the URL.
0:46 Call this states just past the URL,
0:50 like we would have file name and then we only want to use two columns.
0:59 Now we have the state code and the region.
1:02 So we got reaching here. We've got region here and we've got the state.
1:06 So now let's see what we would need to do to join this together before you
1:11 do any joins like this one of the things I always like to dio is use
1:16 our value counts. So we look at the states and we can see how many
1:21 regions there are and how many states are in each region.
1:25 Let's do the same thing for our sales rep,
1:29 so that looks pretty good. But then you'll notice something.
1:34 So if you pay close attention,
1:36 Northeast here is not spelled the same way as Northeast in the States file.
1:42 So we're gonna need to clean that up.
1:44 There's a couple different ways we can do it.
1:46 But what I'm gonna do is I'm gonna convert all of the regions to upper case
1:50 So remember we used our excess er's.
1:56 So I've converted each one of those.
1:58 So let's rerun our value counts and see what it looks like.
2:03 All right, that looks good.
2:07 Now our value counts are showing that everything's capital,
2:10 so we can probably do a joint on that before we do the joint.
2:14 I realize Ever got to do something?
2:23 I forgot toe. Actually. Take a look at the info just to double check
2:26 and make sure everything's coming through is expected.
2:28 So stay code and region or objects.
2:31 Total sales manager. Everything else customers is an object.
2:35 And 10 years afloat. So this looks good.
2:38 So now we should have our data where we can merge everything together and we'll walk
2:44 through that in just a second.