Move from Excel to Python with Pandas Transcripts
Chapter: Aggregating, grouping, and merging
Lecture: Merging data in Pandas

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now we'll show how to do a similar function using pandas. First thing we need to do is define where each of our files are stored.
0:19 That's correct. That typo Now, we defined each file. We need to read the men. Now. We should have three data frames that contain those three files.
0:39 So let's do a quick check just to make sure all the data got red in as expected. Let's do the same thing for the levels.
0:55 Okay? Everything looks pretty good there. So all the day looks like he came in as expected. It's the size of the original file.
1:06 So let's figure out how to start to combine it. The first combination we want to do our add those extra sales tool.
1:12 Our original file. We're going to use the can cat function to do that. The contamination function takes a list of data frames and in this case,
1:25 because the data frames all have the same column names and knows that they should be combined and it gives us 1000 50 rose.
1:33 I want to highlight one other way that you'll see this type of activity happen. You can use upend as well, so we can take our original data frame,
1:43 do the same thing, so we get, um, we upended the extra sales to the end. I like to use Can cat because it's a little more robust.
1:53 You can pass more parameters to it. You can also combine multiple data frames together.
2:00 So I think it's, ah better habit to get into as you get started. So let's make sure we define what that total data frame looks like.
2:08 So here we say that all sales is gonna be the concatenation of sales and extra sales. Let's just do a quick check on this shape that looks good.
2:20 Okay, so we've got 1000 50 rose, and now we need to figure out how to get our customer levels in here.
2:26 The way we want to combine our customer levels with all sales is to use emerge So let's just give this a shot.
2:32 Will do PT merge and see if we get is lucky's. We did last time and I get air. So I tried to emerge all sales in levels and they get a merger.
2:43 No common columns. So the problem is, if we look at our all sales, see how we have company as the column name for the company.
2:53 But if we take a look at our levels, scroll up a little bit. It's called company name.
2:58 So Pandas doesn't know that company and company name are the same value. So we need to tell pandas that so we can do that a couple different ways
3:06 We're going to show how to use pants merge to do that. So we tell it all sales in levels. But then we need to tell it what to use for the left,
3:15 which in this case is all sales. So what this command is doing is it's saying,
3:29 merged the all sales in the levels data frame using a left join and the company and company name are equivalent columns and match on those two.
3:42 So now we have a data frame that has the company, the company name and the level So you concede that included both data frames in that
3:50 level, data frame shows up at the end. The other approach is to actually rename or columns.
3:57 So if you recall, let's take a look at what levels Looks like if we want to. We could rename company name the company.
4:10 So here we can pass a dictionary of column names and what we want. The new column names to be and run that. And now let's take a look.
4:21 So now it's company we do. Our PT emerge now doesn't give us nearer, and it looks good. So it says Company and level.
4:34 So the nice thing about this is because we renamed it and Pandas was able to understand those names. It did not duplicate the name.
4:43 So this is a handy thing to keep in mind as you merge your data together that sometimes it's easier to rename your columns for consistency sake,
4:52 and the final thing to remember is we still haven't actually stored this results. So let's build a variable called Final Data.
5:04 Let's take a look at it that looks good. Shape looks good, and it has all of our data.
5:18 Now we have successfully created a single data frame that has 1000 and 50 transactions as well as our level and weaken.
5:25 Do further analysis with this, and we've done it all in a very repeatable way
5:30 that we can use in the future when we need to perform similar sorts of functions


Talk Python's Mastodon Michael Kennedy's Mastodon