Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Demo: Understanding initial data
Login or
purchase this course
to watch this video and the rest of the course contents.
Okay, now let's go ahead and read in the Excel file into our Jupyter notebook
I'm going to go through the process of launching the Notebook one more time. So let's "conda activate" our work environment.
The next thing we need to do is go into the directory where the files are So I placed them in a sales analysis directory,
and now I'm going to run Jupyter notebook. And here's my notebook. The two files that are already there were created by Cookie
Cutter, but I'm gonna go ahead and create a new one so we can walk through that process. Click on New Python3 notebook and remember,
one of the first things need to do is make sure to change the title. It comes in as an untitled notebook,
so you can see that here as well as in the URL. So let's call this sales analysis exploration.
That's a really important thing to do so that you're in a good habit of organizing
your data, I am going to create a markdown cell and press shift enter so that it
gets rendered. This is a good habit to get into so that you understand why
you did this notebook and what the days sources were and how you wanted to use this to answer a business problem.
So now let's get into actually writing some Python code. We put our imports at the top,
and I'm just going to use pathlib to access the files and then pandas in a second to read in that file.
So what I've done here is referenced the sample sales file in relation to the current working directory. And it is in a subdirectory called raw.
So I define that input file, and then I'm going to read that file in using the "pd.read_excel()" function in pandas
and nothing happens. But you can see that the number incriminated here. So there was something that happened behind the scenes.
If we want to see what a variable looks like, we just type df (for data frame). And now we see the data frame representation that looks very
similar to the Excel file. So let me go through a couple things that you will typically do the first time you read a file into pandas.
You can use the head command to look at the top five rows. You can use df tail, see the bottom five.
This is really helpful. Almost every time you read in the data, you're gonna look at what comes at the top and what comes in at the bottom
Remember, we talked about columns, So if you want to look at what the columns are, type df.columns and you can see that has a list of all the
columns they calls it and index, and that's gonna be important later for us to access our data.
The other thing that I like to do is the shape command - "df.shape". And so this tells us how many rows.
So we have 1000 rows and 7 columns in the data. So this is a really compact way to understand your data and really important thing to
do as you go through and manipulate data to make sure that you are keeping all the data together, not dropping things inadvertently.
The other useful commanders DF info -, which shows you all the columns,
how many different values are in the column and what data type they are.
This is really important as we start to manipulate the data because some of the analysis can't be done if the data is not in the correct data type.
The final command I'm gonna show is DF describe - df.describe() - which gives a quick summary of all
the numeric columns. So this is a really handy way to get a feel for the overall structure of your data.
It tells you how many instances of the data you have. It does some basic math on the mean staring deviation the men Max and the various
percentiles. And this is all a very standard process that I go through almost every time I load in data and starts to get in my mind what the shape of
the data is, what the structure is before I do further analysis.