Move from Excel to Python with Pandas Transcripts
Chapter: Aggregating, grouping, and merging
Lecture: Merging data in Pandas

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now we'll show how to do a similar function using pandas.
0:04 First thing we need to do is define where each of our files are stored.
0:18 That's correct. That typo Now,
0:20 we defined each file. We need to read the men.
0:32 Now. We should have three data frames that contain those three files.
0:38 So let's do a quick check just to make sure all the data got red in
0:41 as expected. Let's do the same thing for the levels.
0:54 Okay? Everything looks pretty good there.
1:00 So all the day looks like he came in as expected.
1:03 It's the size of the original file.
1:05 So let's figure out how to start to combine it.
1:08 The first combination we want to do our add those extra sales tool.
1:11 Our original file. We're going to use the can cat function to do that.
1:18 The contamination function takes a list of data frames and in this case,
1:24 because the data frames all have the same column names and knows that they should be
1:28 combined and it gives us 1000 50 rose.
1:32 I want to highlight one other way that you'll see this type of activity happen.
1:36 You can use upend as well,
1:37 so we can take our original data frame,
1:42 do the same thing, so we get,
1:43 um, we upended the extra sales to the end.
1:47 I like to use Can cat because it's a little more robust.
1:52 You can pass more parameters to it.
1:54 You can also combine multiple data frames together.
1:59 So I think it's, ah better habit to get into as you get started.
2:02 So let's make sure we define what that total data frame looks like.
2:07 So here we say that all sales is gonna be the concatenation of sales and extra
2:11 sales. Let's just do a quick check on this shape that looks good.
2:19 Okay, so we've got 1000 50 rose,
2:22 and now we need to figure out how to get our customer levels in here.
2:25 The way we want to combine our customer levels with all sales is to use emerge
2:29 So let's just give this a shot.
2:31 Will do PT merge and see if we get is lucky's.
2:33 We did last time and I get air.
2:38 So I tried to emerge all sales in levels and they get a merger.
2:42 No common columns. So the problem is,
2:47 if we look at our all sales,
2:48 see how we have company as the column name for the company.
2:52 But if we take a look at our levels,
2:54 scroll up a little bit. It's called company name.
2:57 So Pandas doesn't know that company and company name are the same value.
3:01 So we need to tell pandas that so we can do that a couple different ways
3:05 We're going to show how to use pants merge to do that.
3:09 So we tell it all sales in levels.
3:11 But then we need to tell it what to use for the left,
3:14 which in this case is all sales.
3:25 So what this command is doing is it's saying,
3:28 merged the all sales in the levels data frame using a left join and the company
3:35 and company name are equivalent columns and match on those two.
3:41 So now we have a data frame that has the company,
3:44 the company name and the level So you concede that included both data frames in that
3:49 level, data frame shows up at the end.
3:52 The other approach is to actually rename or columns.
3:56 So if you recall, let's take a look at what levels Looks like if we
4:00 want to. We could rename company name the company.
4:09 So here we can pass a dictionary of column names and what we want.
4:13 The new column names to be and run that.
4:15 And now let's take a look.
4:20 So now it's company we do.
4:23 Our PT emerge now doesn't give us nearer,
4:29 and it looks good. So it says Company and level.
4:33 So the nice thing about this is because we renamed it and Pandas was able to
4:38 understand those names. It did not duplicate the name.
4:42 So this is a handy thing to keep in mind as you merge your data together
4:46 that sometimes it's easier to rename your columns for consistency sake,
4:51 and the final thing to remember is we still haven't actually stored this results.
4:55 So let's build a variable called Final Data.
5:03 Let's take a look at it that looks good.
5:11 Shape looks good, and it has all of our data.
5:17 Now we have successfully created a single data frame that has 1000 and 50 transactions as
5:23 well as our level and weaken.
5:24 Do further analysis with this, and we've done it all in a very repeatable way
5:29 that we can use in the future when we need to perform similar sorts of functions