Move from Excel to Python with Pandas Transcripts
Chapter: Data I/O (input and output)
Lecture: Reading non-tabular Excel files
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
in this example will go through reading in and processing a file that is more complicated than what we've done up until now.
0:09
So I have to find my notebook with my imports and the source file. So let's go ahead and read in that file.
0:16
So this looks like everything got red in OK, but if I look at the head,
0:21
I'm getting indie data. So something didn't come incorrectly when reading that file.
0:26
So let's take a look at the file and see if we can figure out what's going on. So here's the file that I'm trying to read in,
0:32
and what you'll notice is that there are multiple sheets. So this is not uncommon from what you're going to see in the real world,
0:38
where an Excel file comes in and it's maybe a little bit messy. So we have a tab that says she, too. We have our sales data,
0:45
we have levels and we have sheet one. And because there are multiple tabs here, multiple worksheets hand is will default to reading in the first one,
0:55
which in this case, is sheet to which is empty. So that's not what we want. So let's go through how we would actually read in the sales data tab.
1:04
So the way we need to do this was copy this. We need to specify the sheet name Now. If we look at it, we can tell that we've started to read in s'more
1:15
data, and it's looking a little bit better, but it doesn't quite look the way we would expect.
1:20
So notice how there's this average price and then this other column out here that looks
1:26
very odd. If we go and look at the original file again and see that we have our data and then those working columns that are over here,
1:34
that's what's causing the problem. So let's try another approach getting this date and cleanly
1:41
So in this example, a continued to specify the sheet name, but I also tell it which columns to use and therefore only went through columns.
1:50
A through G. So we have our data through the extended amount calm, which is what we want and also take a look at info and everything looks pretty
1:59
good. So we have her invoices an object. The company is an object purchase day just coming through the date,
2:04
time and any numeric fields air coming through his imagers, So it looks like this data is read in pretty well.
2:11
So let's take a look at our spreadsheet and see if there's another data set we want to look at. So let's take a look at this levels tab,
2:19
so I'm going to read in a different data frame for the levels. So this is another scenario that you'll see when you start reading files that you get
2:31
a lot of these unnamed columns. So that tells us that there's another problem with the way we read in the file
2:37
So let's look at the original file again, and what you'll notice on this Levels tab is that the data starts in Row three It doesn't start at the top,
2:47
so Pandas doesn't know that it tries to read in it. And for our column names based on starting in a one here in this example.
2:55
So we need to tell pandas where to start reading in the file, and to do this, we're gonna pass in the header parameter.
3:04
That looks a little better. So what header is done is told us to read in the file and just start at that second row,
3:12
but it zero index, So it's really gonna be the third row. So that's always something to keep in mind.
3:16
It can get a little confusing with the zero index e. One other thing that's a little interesting about this data set is I've got this Notes
3:24
column where there's a bunch of in a ends s so it's probably not that useful for us. So I'm going to get rid of that when I read it in
3:33
and show another example of how to use use calls when using use calls. We can specify the columns by the letters as cell would look at them.
3:43
So we tell it to use a and then all of the calm sea through H So that gets us our company name or level our zip code and update month
3:53
up a day, year percent to target.