#100DaysOfCode in Python Transcripts
Chapter: Days 49-51: Measuring performance
Lecture: Demo: Getting started
Login or
purchase this course
to watch this video and the rest of the course contents.
Let's take a practical example and see how profiling can help us understand this performance. We previously talked about exploring CSV data
earlier in the course, so we're going to take that exact same code and we're going to try to understand it, and in fact tweak it a little bit,
based on what we see in the performance. So, let's pull this up here. We actually have in our demo code over here,
under 'Days 49-51' we have two copies, and right now they're the same, of course the final code'll be changed,
the starter code is exactly what you're going to see which we start with. So you can play around with this data over here,
this is more or less just the end product from the CSV section. Over here we're going to work on this
and we're going to try to understand its performance. We're going to actually step outside of PyCharm here, let me just copy the path to this file,
there's a couple things we'll need to do. And first thing we want to activate our virtual environment.
What we want to do is we want to use a built in module and we can understand the entire program's behavior from the outside,
we don't have to write any code to do this. So what we want to do is we want to run cProfile, and we want to tell it to sort,
we'll talk about sorting in a minute, we want to run that against program.py. How's it going to work? Poorly, because it's not a separate program,
it's just a module in Python so what we need to do is say Python -m to run the module. cProfile, capitalization here, capital 'P' matters.
Now we're going to give it this, and let's see if that works. Okay, great, we got a bunch of gibberish-looking stuff here,
a lot of things going on about frozen imports and all sorts of things, and it turns out this is not how we want to look at our code.
I don't know how it's sorting it but it's not the right way. We would like to sort by cumulative time. There's basically two things that
you probably care about here. One is per call, which is how much time is each one individually spending, and I think this is sort of the same thing,
like how much time is just in this function. Not functions it calls, or above, but like summed up across the number of calls.
But I find that by far the most useful one is this cumtime, cumulative time. So let's go over here, and you need to pass the sort parameter,
but it won't work if you put it over here, -S. It needs to be before the script. So we'll say '-S cumtime'. Try it again.
Okay, now let's see what we've got. A couple of built in imports, and notice we're working with research.py and program.py
so this is some module stuff, this is not a lot we can do. But this right here, research.py init, this is pretty interesting.
So this is actually the code that we call to read, basically parse the CSV. So if we look over here, this init is the thing
that actually does the loading, the parse row. Over here like this we can look for parse row, and there's that.
And we're spending about 300 - about 3 milliseconds on this, not super, super long, but we're calling a bunch of times.
Okay, so it turns out that this program's a little hard to understand because it's not doing that much. This is actually an easier job for complex,
involved applications I find a lot of times because it's pretty clear where it's spending time. This one, it's actually really quick.
But we're still going to analyze it, don't worry. I just want to sort of give you the sense that actually this, even though it's a simple example,
it's kind of hard to understand the performance. If you want to just run the whole thing and see how it works, here you go.
Just run cProfile, sort by something, give a domain script to run, off you go. This is one way, but notice when we did this
there's all sorts of stuff in here that's irrelevant to us. For example, initializing the typing module. I don't care, we can't control that,
that's just something we're doing to define some definitions. You could say 'don't use typing' and that's an option, but can you hide that?
Does importlib bootstrap find and load? These things, loading the module, we're spending significant time here. We can't control that.
So what we want to do is we want to measure the parts that we can really carefully work with and control.
So we're going to see how to do that using the API from within Python and we'll get better answers here.