Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Working with column names
0:00 in the next section. We're going to work with our column names a little bit and see how to access some data. So remember when we open up our notebook,
0:09 we need to restart and run all the cells again, which is pretty quick. The other thing I want to walk through is kind of how I break up my notebook.
0:17 So now since we're gonna work on a new section, I am gonna put a break in here
0:20 call this column names I'm using markdown and I selected the markdown type and you still press shift enter and it will render the markdown.
0:31 This is a good habit to get into so that you can organize your notebooks and know where you are and what you're doing.
0:37 So we talked about how important column names were. If you look at the column names, property, you can see all the columns.
0:46 We can also convert that to a list if we want, which is gonna be helpful sometimes when you need to clean them up because we care
0:53 about column names. You'll notice some potential errors with the column names here,
0:58 for instance. We already talked about how extended amount has a space. And then also, when I added those new column names of country and fee,
1:06 I capitalized them. But none of the other ones were capitalized. So let's see how to clean those up.
1:12 This is a very common task that you're going to do when you're working with your
1:16 data. So we're gonna use some list comprehensions to clean up the spaces. So let me type it out and then I'll explain what's going on.
1:29 So what this does we talked about the DF columns list, and I'm going to replace all the spaces with Underscores.
1:37 So we've converted our extended amount where it used to have a space to have an
1:42 underscore. So this is going make it a little easier for us to manipulate the data
1:46 and keep our columns consistent if we want to assign that back so that that is the column name going forward. Because if we look at DF head,
1:55 we can see we haven't actually changed anything, so to change it, and I'm going to cut and paste to show you how
2:01 I would normally do this. Put that in there, and now when we run it and take a look at head.
2:09 You can see that we now have extended amount, the same way. You can also do a similar sort of cleaning activity for the
2:17 country and fee columns. So let's go ahead, and I'm going to copy this instead of doing a replace,
2:24 I'm going to lower. So see now country and fee are lower case so we can do ah similar sort of activity.
2:35 And I purposely do the copy and paste is every retyping everything, because this is the way you work in a notebook.
2:42 It's an interactive process where you test out some code, see what the values are, and then use that to make your changes.
2:49 Okay, so now countries lower case fee is lower case. Our data frame has all the column names that we would expect.
2:56 We can start to look at how to access the data in the data frames. So we talked about columns. We've talked about how to select an individual column,
3:05 and in Excel, you would go in and use your cursor and click on a row or sell and make your changes. You can't do that in pandas.
3:15 So how do you access a individual row or column. Well, there's a command called loc for location.
3:22 So let's do df.loc[0,:] and what this does it says pick rows zero and show me all of the columns. You can see that I've have ZN-870-29 Realcube.
3:38 This is an object that contains all of the values in the first row. If we want to select multiple rows, we can just pass a list.
3:47 And then this colon here is kind of like, ah, wild card. So that means Show me all of the columns. Let's make sure to put a loc in there,
3:57 and now we can see the first three rows with all the columns and let's say we wanted to maybe just look at a subset of columns.
4:07 So I am doing a copy and paste again. And instead of selecting a certain number of rows, I'm going to put a list of columns here.
4:17 So let's just say I want to look at company / purchased / extended amount And we get error here? Why is that so forgot a comma.
4:34 I want to go through the trouble shooting because you're going to get these types of errors when you do the work on your own.
4:40 So I want to show how to do that, that we all make mistakes and I'll make typos.
4:44 So now we have selected used that wild card yet again to select all the rows and only three columns, and you can combine this as well.
4:56 So, for instance, if we wanted to say, Well, let's just get maybe df.loc[1,2,3] So we've got the first three rows in just
5:06 those three columns. We can slice the data like we would on a typical pipe on list. So instead of doing one through three,
5:14 let's maybe do all. And instead of having a explicit list, we're going to company colon extended amount,
5:25 and it chooses everything between company and extended amount. So if you forget, so you can see company their extended amount so it
5:37 doesn't include invoice and drops the country and fee. So this is This is the one of the main ways that you can select different
5:46 rows and columns, and one of the things that I frequently do when I read in an Excel file is maybe it has more columns than I need.
5:55 So I want to sub select a certain number of columns. So let me show you how it would do that and actually create a new data
6:02 frame. So I'm going to create one that's called Company only and use df.loc We want to copy all of the rows and we'll just do company through extended
6:13 amount. And one of the things you want to get in the habit of as well is to make an actual copy of it. So I don't want to reference the original.
6:22 I want to create a whole new copy and here we go. So we've got company only, you can see that has 1000 rows it just has a
6:31 subset of columns that we care about. So we talked about df.loc. I want to talk about using iloc,
6:39 which is for the index. So it let's try something here. It could be times where you want to look at the columns by numbers by the index. If we do that,
6:54 we get an error, so that's telling us that it can't do a slice. So the way we want to do it, there's nothing wrong with this idea.
7:02 You want to use the df.iloc[:,1:5]. So if you do, I look then you can choose by index number.
7:07 So this is using one through five versus company through quantity. So the the important thing is you've got two different ways to select your rows and
7:17 columns. iloc and loc. Let's do one more example. Show how iloc works. This example in of itself is not that useful,
7:26 but it's really important to start to understand the row and column notation for loc and
7:32 iloc, and we're going to go through a more powerful way in future lessons to sub select your data. But this is the intro way to start thinking about
7:42 it and get familiarity with this concept.