Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 2: Excel Integration with Adult Income Data
Lecture: Read Excel file in Pandas with read_excel and Pyarrow

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Let's read our Excel data now. So one of the things you'll want to make sure you have installed is this OpenPyXL.
0:08 Pandas is going to leverage that, but if you don't have that, your imports might fail. So I'm going to import my Pandas data.
0:17 Let's look at what this data looks like. So this is our export. This is a relatively straightforward Excel file. Let's load this in in Pandas.
0:29 Instead of using read CSV, I'm going to use read Excel. I'm going to say detype back in as pyarrow because I want to have pyarrow
0:37 representing my data. Here is my data frame here. Now one of the things that might stick out to you is that first column that says unnamed 0.
0:45 What's going on there? Well, if we look at our data, you can actually see we have that A column which is the index. So
0:57 I already have an index in there. When I read my Excel file, Pandas didn't see that as an index. It just saw it as a column.
1:05 So I'm going to tell it that the column 0 is the index column.
1:09 Let's run that and see what happens when we do that. Okay, that's looking a little bit better.
1:13 Let's also just check our detypes there. It looks like we do have pyarrow types. Very basic to read an Excel file.
1:21 You can use read Excel. Again, there are various options for this. You can pull those up in Jupyter and check out various options to
1:29 tweak how you import Excel data.


Talk Python's Mastodon Michael Kennedy's Mastodon