Move from Excel to Python with Pandas Transcripts
Chapter: Case study walk through: Sales commissions
Lecture: Cleaning up region

Login or purchase this course to watch this video and the rest of the course contents.
0:00 the regions to find here are official regions that the U. S. Census Bureau defines which states go into it.
0:08 Which region? And if you do a search on Wikipedia, you confined it. But it's not very easy to digest into a pandas data frame
0:15 And instead of typing at all, I did a Google search and found this repository that has each state, the state code, the region and the division.
0:26 So this is going to save us a lot of time, and one of the things that pandas can do is actually read in this file.
0:33 So let's go ahead and define what that you are. L Is this a little trick with reading? See SV files from Get Hub put past raw equals True.
0:44 So that gives us the URL. Call this states just past the URL, like we would have file name and then we only want to use two columns.
1:00 Now we have the state code and the region. So we got reaching here. We've got region here and we've got the state.
1:07 So now let's see what we would need to do to join this together before you do any joins like this one of the things I always like to dio is use
1:17 our value counts. So we look at the states and we can see how many regions there are and how many states are in each region.
1:26 Let's do the same thing for our sales rep, so that looks pretty good. But then you'll notice something. So if you pay close attention,
1:37 Northeast here is not spelled the same way as Northeast in the States file. So we're gonna need to clean that up.
1:45 There's a couple different ways we can do it. But what I'm gonna do is I'm gonna convert all of the regions to upper case
1:51 So remember we used our excess er's. So I've converted each one of those. So let's rerun our value counts and see what it looks like.
2:04 All right, that looks good. Now our value counts are showing that everything's capital, so we can probably do a joint on that before we do the joint.
2:15 I realize Ever got to do something? I forgot toe. Actually. Take a look at the info just to double check
2:27 and make sure everything's coming through is expected. So stay code and region or objects. Total sales manager. Everything else customers is an object.
2:36 And 10 years afloat. So this looks good. So now we should have our data where we can merge everything together and we'll walk
2:45 through that in just a second.

Talk Python's Mastodon Michael Kennedy's Mastodon