Move from Excel to Python with Pandas Transcripts
Chapter: Intro to Pandas
Lecture: Diving into the data
Login or
purchase this course
to watch this video and the rest of the course contents.
Okay, let's go through some more examples of how to work with Jupyter notebooks and
pandas, and I've opened up my notebook and I want to walk through something that could be a little confusing to new users.
So if you look at this notebook, I've just opened it up and you can see that in this cell I'm showing the
data frame by typing DF. And so there may be a temptation to go in here and let's just take a look at the head and remember,
shift + enter. I press that and I get a name error df is not defined. And the reason is I haven't actually run everything in the notebook,
so it's really useful to hit this menu option. Kernel, restart and run all, and you'll get this option to restart.
Run all cells. You do that and what this does. It runs through all of the code from top to bottom and makes everything live in
the current Kernel. So now if I make a change, everything works. You can also see that the number has incriminated.
So went from 1, 2, 4 5, 6, 7, 8, 9 and then back up to 10 and 3 is gone because I reran in that cell.
So this points to some of the power of Jupyter notebooks, but also how it can be confusing sometimes if you get out of order.
So the thing I would recommend is that you frequently use Kernel Restart and run all And if you don't want to use the menu,
this command here, restart the Kernel rerun, everything will do the same thing. So once we've done that, we've taken a look at our data frame.
And now we want to actually look at some columns. So the simplest way to do this, remember, we have. If you ever forget what columns do I have,
type df.head() and we have these columns called Invoice / Company / purchased_date. So let's just say df.invoice and I see all of the invoice
column all of the values in the invoice. You can see each one it truncates if you are in the middle because it doesn't want to
show 1000 rows, which makes sense. It's pretty good. That should be pretty intuitive to someone that has worked with Python before.
But what happens if we want to look at this extended amount where there's a space in the column name, you get a syntax error,
and that's because Python doesn't understand what this space means. So the syntax you need to use is put a bracket around it and quotes,
and then you can reference the column and here you go, so you can see that. 323, 420, 161, 203, 684. if I scroll appear 323, 420, 161, 203, 684.
So the the reason I point this out is you have two options to access the
columns, and sometimes you'll see code that has that period versus the bracket notation. I encourage you to always use the bracket notation.
It will make your life easier when you have these types of situations and it's consistent with the other operations you're gonna want to do and pandas.
So the main reason I bring it up is so that you're aware of it,
and you can keep that in mind when you are doing your analysis and doing your problem solving.