Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 2: Excel Integration with Adult Income Data
Lecture: Read Excel file in Pandas with read_excel and Pyarrow
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's read our Excel data now. So one of the things you'll want to make sure you have installed is this OpenPyXL.
0:08
Pandas is going to leverage that, but if you don't have that, your imports might fail. So I'm going to import my Pandas data.
0:17
Let's look at what this data looks like. So this is our export. This is a relatively straightforward Excel file. Let's load this in in Pandas.
0:29
Instead of using read CSV, I'm going to use read Excel. I'm going to say detype back in as pyarrow because I want to have pyarrow
0:37
representing my data. Here is my data frame here. Now one of the things that might stick out to you is that first column that says unnamed 0.
0:45
What's going on there? Well, if we look at our data, you can actually see we have that A column which is the index. So
0:57
I already have an index in there. When I read my Excel file, Pandas didn't see that as an index. It just saw it as a column.
1:05
So I'm going to tell it that the column 0 is the index column.
1:09
Let's run that and see what happens when we do that. Okay, that's looking a little bit better.
1:13
Let's also just check our detypes there. It looks like we do have pyarrow types. Very basic to read an Excel file.
1:21
You can use read Excel. Again, there are various options for this. You can pull those up in Jupyter and check out various options to
1:29
tweak how you import Excel data.