Move from Excel to Python with Pandas Transcripts
Chapter: Data I/O (input and output)
Lecture: Reading non-tabular Excel files

Login or purchase this course to watch this video and the rest of the course contents.
0:00 in this example will go through reading in and processing a file that is more complicated
0:06 than what we've done up until now.
0:08 So I have to find my notebook with my imports and the source file.
0:12 So let's go ahead and read in that file.
0:15 So this looks like everything got red in OK,
0:17 but if I look at the head,
0:20 I'm getting indie data. So something didn't come incorrectly when reading that file.
0:25 So let's take a look at the file and see if we can figure out what's
0:27 going on. So here's the file that I'm trying to read in,
0:31 and what you'll notice is that there are multiple sheets.
0:34 So this is not uncommon from what you're going to see in the real world,
0:37 where an Excel file comes in and it's maybe a little bit messy.
0:41 So we have a tab that says she,
0:42 too. We have our sales data,
0:44 we have levels and we have sheet one.
0:46 And because there are multiple tabs here,
0:49 multiple worksheets hand is will default to reading in the first one,
0:54 which in this case, is sheet to which is empty.
0:57 So that's not what we want.
0:59 So let's go through how we would actually read in the sales data tab.
1:03 So the way we need to do this was copy this.
1:06 We need to specify the sheet name Now.
1:11 If we look at it, we can tell that we've started to read in s'more
1:14 data, and it's looking a little bit better,
1:16 but it doesn't quite look the way we would expect.
1:19 So notice how there's this average price and then this other column out here that looks
1:25 very odd. If we go and look at the original file again and see that
1:29 we have our data and then those working columns that are over here,
1:33 that's what's causing the problem. So let's try another approach getting this date and cleanly
1:40 So in this example, a continued to specify the sheet name,
1:44 but I also tell it which columns to use and therefore only went through columns.
1:49 A through G. So we have our data through the extended amount calm,
1:53 which is what we want and also take a look at info and everything looks pretty
1:58 good. So we have her invoices an object.
2:00 The company is an object purchase day just coming through the date,
2:03 time and any numeric fields air coming through his imagers,
2:07 So it looks like this data is read in pretty well.
2:10 So let's take a look at our spreadsheet and see if there's another data set we
2:14 want to look at. So let's take a look at this levels tab,
2:18 so I'm going to read in a different data frame for the levels.
2:25 So this is another scenario that you'll see when you start reading files that you get
2:30 a lot of these unnamed columns.
2:33 So that tells us that there's another problem with the way we read in the file
2:36 So let's look at the original file again,
2:39 and what you'll notice on this Levels tab is that the data starts in Row three
2:44 It doesn't start at the top,
2:46 so Pandas doesn't know that it tries to read in it.
2:48 And for our column names based on starting in a one here in this example.
2:54 So we need to tell pandas where to start reading in the file,
2:57 and to do this, we're gonna pass in the header parameter.
3:03 That looks a little better. So what header is done is told us to read
3:08 in the file and just start at that second row,
3:11 but it zero index, So it's really gonna be the third row.
3:14 So that's always something to keep in mind.
3:15 It can get a little confusing with the zero index e.
3:18 One other thing that's a little interesting about this data set is I've got this Notes
3:23 column where there's a bunch of in a ends s so it's probably not that useful
3:28 for us. So I'm going to get rid of that when I read it in
3:32 and show another example of how to use use calls when using use calls.
3:38 We can specify the columns by the letters as cell would look at them.
3:42 So we tell it to use a and then all of the calm sea through H
3:46 So that gets us our company name or level our zip code and update month
3:52 up a day, year percent to target.