Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Demo: Understanding initial data
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Okay, now let's go ahead and read in the Excel file into our Jupyter notebook
0:05
I'm going to go through the process of launching the Notebook one more time. So let's "conda activate" our work environment.
0:12
The next thing we need to do is go into the directory where the files are So I placed them in a sales analysis directory,
0:18
and now I'm going to run Jupyter notebook. And here's my notebook. The two files that are already there were created by Cookie
0:26
Cutter, but I'm gonna go ahead and create a new one so we can walk through that process. Click on New Python3 notebook and remember,
0:34
one of the first things need to do is make sure to change the title. It comes in as an untitled notebook,
0:40
so you can see that here as well as in the URL. So let's call this sales analysis exploration.
0:48
That's a really important thing to do so that you're in a good habit of organizing
0:52
your data, I am going to create a markdown cell and press shift enter so that it
1:04
gets rendered. This is a good habit to get into so that you understand why
1:08
you did this notebook and what the days sources were and how you wanted to use this to answer a business problem.
1:16
So now let's get into actually writing some Python code. We put our imports at the top,
1:23
and I'm just going to use pathlib to access the files and then pandas in a second to read in that file.
1:32
So what I've done here is referenced the sample sales file in relation to the current working directory. And it is in a subdirectory called raw.
1:43
So I define that input file, and then I'm going to read that file in using the "pd.read_excel()" function in pandas
1:49
and nothing happens. But you can see that the number incriminated here. So there was something that happened behind the scenes.
1:58
If we want to see what a variable looks like, we just type df (for data frame). And now we see the data frame representation that looks very
2:07
similar to the Excel file. So let me go through a couple things that you will typically do the first time you read a file into pandas.
2:16
You can use the head command to look at the top five rows. You can use df tail, see the bottom five.
2:23
This is really helpful. Almost every time you read in the data, you're gonna look at what comes at the top and what comes in at the bottom
2:28
Remember, we talked about columns, So if you want to look at what the columns are, type df.columns and you can see that has a list of all the
2:40
columns they calls it and index, and that's gonna be important later for us to access our data.
2:46
The other thing that I like to do is the shape command - "df.shape". And so this tells us how many rows.
2:54
So we have 1000 rows and 7 columns in the data. So this is a really compact way to understand your data and really important thing to
3:02
do as you go through and manipulate data to make sure that you are keeping all the data together, not dropping things inadvertently.
3:11
The other useful commanders DF info - df.info(), which shows you all the columns,
3:16
how many different values are in the column and what data type they are.
3:21
This is really important as we start to manipulate the data because some of the analysis can't be done if the data is not in the correct data type.
3:31
The final command I'm gonna show is DF describe - df.describe() - which gives a quick summary of all
3:42
the numeric columns. So this is a really handy way to get a feel for the overall structure of your data.
3:49
It tells you how many instances of the data you have. It does some basic math on the mean staring deviation the men Max and the various
3:58
percentiles. And this is all a very standard process that I go through almost every time I load in data and starts to get in my mind what the shape of
4:08
the data is, what the structure is before I do further analysis.