Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Working with column names
0:00 in the next section. We're going to work with our column names a little bit
0:04 and see how to access some data.
0:06 So remember when we open up our notebook,
0:08 we need to restart and run all the cells again,
0:11 which is pretty quick. The other thing I want to walk through is kind of
0:14 how I break up my notebook.
0:16 So now since we're gonna work on a new section, I am gonna put a break in here
0:19 call this column names I'm using markdown and I selected the markdown type and
0:26 you still press shift enter and it will render the markdown.
0:30 This is a good habit to get into so that you can organize your notebooks and
0:34 know where you are and what you're doing.
0:36 So we talked about how important column names were.
0:41 If you look at the column names,
0:42 property, you can see all the columns.
0:45 We can also convert that to a list if we want,
0:48 which is gonna be helpful sometimes when you need to clean them up because we care
0:52 about column names. You'll notice some potential errors with the column names here,
0:57 for instance. We already talked about how extended amount has a space.
1:01 And then also, when I added those new column names of country and fee,
1:05 I capitalized them. But none of the other ones were capitalized.
1:09 So let's see how to clean those up.
1:11 This is a very common task that you're going to do when you're working with your
1:15 data. So we're gonna use some list comprehensions to clean up the spaces.
1:21 So let me type it out and then I'll explain what's going on.
1:28 So what this does we talked about the DF columns list,
1:31 and I'm going to replace all the spaces with Underscores.
1:36 So we've converted our extended amount where it used to have a space to have an
1:41 underscore. So this is going make it a little easier for us to manipulate the data
1:45 and keep our columns consistent if we want to assign that back so that that
1:50 is the column name going forward.
1:52 Because if we look at DF head,
1:54 we can see we haven't actually changed anything,
1:57 so to change it, and I'm going to cut and paste to show you how
2:00 I would normally do this. Put that in there,
2:04 and now when we run it and take a look at head.
2:08 You can see that we now have extended amount, the same way.
2:11 You can also do a similar sort of cleaning activity for the
2:16 country and fee columns. So let's go ahead,
2:19 and I'm going to copy this instead of doing a replace,
2:23 I'm going to lower. So see now country and fee are lower case so we
2:29 can do ah similar sort of activity.
2:34 And I purposely do the copy and paste is every retyping everything,
2:39 because this is the way you work in a notebook.
2:41 It's an interactive process where you test out some code,
2:43 see what the values are, and then use that to make your changes.
2:48 Okay, so now countries lower case fee is lower case.
2:51 Our data frame has all the column names that we would expect.
2:55 We can start to look at how to access the data in the data frames.
3:00 So we talked about columns. We've talked about how to select an individual column,
3:04 and in Excel, you would go in and use your cursor and click on a
3:09 row or sell and make your changes.
3:12 You can't do that in pandas.
3:14 So how do you access a individual row or column.
3:18 Well, there's a command called loc for location.
3:21 So let's do df.loc[0,:] and what this does it says pick rows
3:29 zero and show me all of the columns.
3:32 You can see that I've have ZN-870-29 Realcube.
3:37 This is an object that contains all of the values in the first row.
3:42 If we want to select multiple rows,
3:45 we can just pass a list.
3:46 And then this colon here is kind of like,
3:50 ah, wild card. So that means Show me all of the columns.
3:54 Let's make sure to put a loc in there,
3:56 and now we can see the first three rows with all the columns and let's say
4:01 we wanted to maybe just look at a subset of columns.
4:06 So I am doing a copy and paste again.
4:09 And instead of selecting a certain number of rows,
4:13 I'm going to put a list of columns here.
4:16 So let's just say I want to look at company / purchased / extended amount
4:25 And we get error here?
4:31 Why is that so forgot a comma.
4:33 I want to go through the trouble shooting because you're going to get these types of
4:36 errors when you do the work on your own.
4:39 So I want to show how to do that,
4:41 that we all make mistakes and I'll make typos.
4:43 So now we have selected used that wild card yet again to select all the rows
4:50 and only three columns, and you can combine this as well.
4:55 So, for instance, if we wanted to say,
4:57 Well, let's just get maybe df.loc[1,2,3] So we've got the first three rows in just
5:05 those three columns. We can slice the data like we would on a typical pipe
5:10 on list. So instead of doing one through three,
5:13 let's maybe do all. And instead of having a explicit list,
5:18 we're going to company colon extended amount,
5:24 and it chooses everything between company and extended amount.
5:28 So if you forget, so you can see company their extended amount so it
5:36 doesn't include invoice and drops the country and fee.
5:40 So this is This is the one of the main ways that you can select different
5:45 rows and columns, and one of the things that I frequently do when I read
5:50 in an Excel file is maybe it has more columns than I need.
5:54 So I want to sub select a certain number of columns.
5:58 So let me show you how it would do that and actually create a new data
6:01 frame. So I'm going to create one that's called Company only and use df.loc
6:06 We want to copy all of the rows and we'll just do company through extended
6:12 amount. And one of the things you want to get in the habit of as
6:16 well is to make an actual copy of it.
6:19 So I don't want to reference the original.
6:21 I want to create a whole new copy and here we go.
6:25 So we've got company only, you can see that has 1000 rows it just has a
6:30 subset of columns that we care about.
6:33 So we talked about df.loc. I want to talk about using iloc,
6:38 which is for the index. So it let's try something here.
6:43 It could be times where you want to look at the columns by numbers by the
6:50 index. If we do that,
6:53 we get an error, so that's telling us that it can't do a slice.
6:57 So the way we want to do it,
6:59 there's nothing wrong with this idea.
7:01 You want to use the df.iloc[:,1:5].
7:02 So if you do, I look then you can choose by index number.
7:06 So this is using one through five versus company through quantity.
7:11 So the the important thing is you've got two different ways to select your rows and
7:16 columns. iloc and loc.
7:18 Let's do one more example. Show how iloc works.
7:22 This example in of itself is not that useful,
7:25 but it's really important to start to understand the row and column notation for loc and
7:31 iloc, and we're going to go through a more powerful way in future lessons
7:37 to sub select your data. But this is the intro way to start thinking about
7:41 it and get familiarity with this concept.