Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Demo: Understanding initial data

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Okay, now let's go ahead and read in the Excel file into our Jupyter notebook
0:05 I'm going to go through the process of launching the Notebook one more time. So let's "conda activate" our work environment.
0:12 The next thing we need to do is go into the directory where the files are So I placed them in a sales analysis directory,
0:18 and now I'm going to run Jupyter notebook. And here's my notebook. The two files that are already there were created by Cookie
0:26 Cutter, but I'm gonna go ahead and create a new one so we can walk through that process. Click on New Python3 notebook and remember,
0:34 one of the first things need to do is make sure to change the title. It comes in as an untitled notebook,
0:40 so you can see that here as well as in the URL. So let's call this sales analysis exploration.
0:48 That's a really important thing to do so that you're in a good habit of organizing
0:52 your data, I am going to create a markdown cell and press shift enter so that it
1:04 gets rendered. This is a good habit to get into so that you understand why
1:08 you did this notebook and what the days sources were and how you wanted to use this to answer a business problem.
1:16 So now let's get into actually writing some Python code. We put our imports at the top,
1:23 and I'm just going to use pathlib to access the files and then pandas in a second to read in that file.
1:32 So what I've done here is referenced the sample sales file in relation to the current working directory. And it is in a subdirectory called raw.
1:43 So I define that input file, and then I'm going to read that file in using the "pd.read_excel()" function in pandas
1:49 and nothing happens. But you can see that the number incriminated here. So there was something that happened behind the scenes.
1:58 If we want to see what a variable looks like, we just type df (for data frame). And now we see the data frame representation that looks very
2:07 similar to the Excel file. So let me go through a couple things that you will typically do the first time you read a file into pandas.
2:16 You can use the head command to look at the top five rows. You can use df tail, see the bottom five.
2:23 This is really helpful. Almost every time you read in the data, you're gonna look at what comes at the top and what comes in at the bottom
2:28 Remember, we talked about columns, So if you want to look at what the columns are, type df.columns and you can see that has a list of all the
2:40 columns they calls it and index, and that's gonna be important later for us to access our data.
2:46 The other thing that I like to do is the shape command - "df.shape". And so this tells us how many rows.
2:54 So we have 1000 rows and 7 columns in the data. So this is a really compact way to understand your data and really important thing to
3:02 do as you go through and manipulate data to make sure that you are keeping all the data together, not dropping things inadvertently.
3:11 The other useful commanders DF info - df.info(), which shows you all the columns,
3:16 how many different values are in the column and what data type they are.
3:21 This is really important as we start to manipulate the data because some of the analysis can't be done if the data is not in the correct data type.
3:31 The final command I'm gonna show is DF describe - df.describe() - which gives a quick summary of all
3:42 the numeric columns. So this is a really handy way to get a feel for the overall structure of your data.
3:49 It tells you how many instances of the data you have. It does some basic math on the mean staring deviation the men Max and the various
3:58 percentiles. And this is all a very standard process that I go through almost every time I load in data and starts to get in my mind what the shape of
4:08 the data is, what the structure is before I do further analysis.


Talk Python's Mastodon Michael Kennedy's Mastodon