Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Working with column names
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
in the next section. We're going to work with our column names a little bit
0:04
and see how to access some data.
0:06
So remember when we open up our notebook,
0:08
we need to restart and run all the cells again,
0:11
which is pretty quick. The other thing I want to walk through is kind of
0:14
how I break up my notebook.
0:16
So now since we're gonna work on a new section, I am gonna put a break in here
0:19
call this column names I'm using markdown and I selected the markdown type and
0:26
you still press shift enter and it will render the markdown.
0:30
This is a good habit to get into so that you can organize your notebooks and
0:34
know where you are and what you're doing.
0:36
So we talked about how important column names were.
0:41
If you look at the column names,
0:42
property, you can see all the columns.
0:45
We can also convert that to a list if we want,
0:48
which is gonna be helpful sometimes when you need to clean them up because we care
0:52
about column names. You'll notice some potential errors with the column names here,
0:57
for instance. We already talked about how extended amount has a space.
1:01
And then also, when I added those new column names of country and fee,
1:05
I capitalized them. But none of the other ones were capitalized.
1:09
So let's see how to clean those up.
1:11
This is a very common task that you're going to do when you're working with your
1:15
data. So we're gonna use some list comprehensions to clean up the spaces.
1:21
So let me type it out and then I'll explain what's going on.
1:28
So what this does we talked about the DF columns list,
1:31
and I'm going to replace all the spaces with Underscores.
1:36
So we've converted our extended amount where it used to have a space to have an
1:41
underscore. So this is going make it a little easier for us to manipulate the data
1:45
and keep our columns consistent if we want to assign that back so that that
1:50
is the column name going forward.
1:52
Because if we look at DF head,
1:54
we can see we haven't actually changed anything,
1:57
so to change it, and I'm going to cut and paste to show you how
2:00
I would normally do this. Put that in there,
2:04
and now when we run it and take a look at head.
2:08
You can see that we now have extended amount, the same way.
2:11
You can also do a similar sort of cleaning activity for the
2:16
country and fee columns. So let's go ahead,
2:19
and I'm going to copy this instead of doing a replace,
2:23
I'm going to lower. So see now country and fee are lower case so we
2:29
can do ah similar sort of activity.
2:34
And I purposely do the copy and paste is every retyping everything,
2:39
because this is the way you work in a notebook.
2:41
It's an interactive process where you test out some code,
2:43
see what the values are, and then use that to make your changes.
2:48
Okay, so now countries lower case fee is lower case.
2:51
Our data frame has all the column names that we would expect.
2:55
We can start to look at how to access the data in the data frames.
3:00
So we talked about columns. We've talked about how to select an individual column,
3:04
and in Excel, you would go in and use your cursor and click on a
3:09
row or sell and make your changes.
3:12
You can't do that in pandas.
3:14
So how do you access a individual row or column.
3:18
Well, there's a command called loc for location.
3:21
So let's do df.loc[0,:] and what this does it says pick rows
3:29
zero and show me all of the columns.
3:32
You can see that I've have ZN-870-29 Realcube.
3:37
This is an object that contains all of the values in the first row.
3:42
If we want to select multiple rows,
3:45
we can just pass a list.
3:46
And then this colon here is kind of like,
3:50
ah, wild card. So that means Show me all of the columns.
3:54
Let's make sure to put a loc in there,
3:56
and now we can see the first three rows with all the columns and let's say
4:01
we wanted to maybe just look at a subset of columns.
4:06
So I am doing a copy and paste again.
4:09
And instead of selecting a certain number of rows,
4:13
I'm going to put a list of columns here.
4:16
So let's just say I want to look at company / purchased / extended amount
4:25
And we get error here?
4:31
Why is that so forgot a comma.
4:33
I want to go through the trouble shooting because you're going to get these types of
4:36
errors when you do the work on your own.
4:39
So I want to show how to do that,
4:41
that we all make mistakes and I'll make typos.
4:43
So now we have selected used that wild card yet again to select all the rows
4:50
and only three columns, and you can combine this as well.
4:55
So, for instance, if we wanted to say,
4:57
Well, let's just get maybe df.loc[1,2,3] So we've got the first three rows in just
5:05
those three columns. We can slice the data like we would on a typical pipe
5:10
on list. So instead of doing one through three,
5:13
let's maybe do all. And instead of having a explicit list,
5:18
we're going to company colon extended amount,
5:24
and it chooses everything between company and extended amount.
5:28
So if you forget, so you can see company their extended amount so it
5:36
doesn't include invoice and drops the country and fee.
5:40
So this is This is the one of the main ways that you can select different
5:45
rows and columns, and one of the things that I frequently do when I read
5:50
in an Excel file is maybe it has more columns than I need.
5:54
So I want to sub select a certain number of columns.
5:58
So let me show you how it would do that and actually create a new data
6:01
frame. So I'm going to create one that's called Company only and use df.loc
6:06
We want to copy all of the rows and we'll just do company through extended
6:12
amount. And one of the things you want to get in the habit of as
6:16
well is to make an actual copy of it.
6:19
So I don't want to reference the original.
6:21
I want to create a whole new copy and here we go.
6:25
So we've got company only, you can see that has 1000 rows it just has a
6:30
subset of columns that we care about.
6:33
So we talked about df.loc. I want to talk about using iloc,
6:38
which is for the index. So it let's try something here.
6:43
It could be times where you want to look at the columns by numbers by the
6:50
index. If we do that,
6:53
we get an error, so that's telling us that it can't do a slice.
6:57
So the way we want to do it,
6:59
there's nothing wrong with this idea.
7:01
You want to use the df.iloc[:,1:5].
7:02
So if you do, I look then you can choose by index number.
7:06
So this is using one through five versus company through quantity.
7:11
So the the important thing is you've got two different ways to select your rows and
7:16
columns. iloc and loc.
7:18
Let's do one more example. Show how iloc works.
7:22
This example in of itself is not that useful,
7:25
but it's really important to start to understand the row and column notation for loc and
7:31
iloc, and we're going to go through a more powerful way in future lessons
7:37
to sub select your data. But this is the intro way to start thinking about
7:41
it and get familiarity with this concept.