#100DaysOfCode in Python Transcripts
Chapter: Days 49-51: Measuring performance
Lecture: Demo: Getting started
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's take a practical example and see how profiling can help us understand this performance. We previously talked about exploring CSV data
0:11
earlier in the course, so we're going to take that exact same code and we're going to try to understand it, and in fact tweak it a little bit,
0:18
based on what we see in the performance. So, let's pull this up here. We actually have in our demo code over here,
0:26
under 'Days 49-51' we have two copies, and right now they're the same, of course the final code'll be changed,
0:33
the starter code is exactly what you're going to see which we start with. So you can play around with this data over here,
0:38
this is more or less just the end product from the CSV section. Over here we're going to work on this
0:44
and we're going to try to understand its performance. We're going to actually step outside of PyCharm here, let me just copy the path to this file,
0:53
there's a couple things we'll need to do. And first thing we want to activate our virtual environment.
1:00
What we want to do is we want to use a built in module and we can understand the entire program's behavior from the outside,
1:08
we don't have to write any code to do this. So what we want to do is we want to run cProfile, and we want to tell it to sort,
1:15
we'll talk about sorting in a minute, we want to run that against program.py. How's it going to work? Poorly, because it's not a separate program,
1:21
it's just a module in Python so what we need to do is say Python -m to run the module. cProfile, capitalization here, capital 'P' matters.
1:30
Now we're going to give it this, and let's see if that works. Okay, great, we got a bunch of gibberish-looking stuff here,
1:36
a lot of things going on about frozen imports and all sorts of things, and it turns out this is not how we want to look at our code.
1:42
I don't know how it's sorting it but it's not the right way. We would like to sort by cumulative time. There's basically two things that
1:48
you probably care about here. One is per call, which is how much time is each one individually spending, and I think this is sort of the same thing,
1:57
like how much time is just in this function. Not functions it calls, or above, but like summed up across the number of calls.
2:05
But I find that by far the most useful one is this cumtime, cumulative time. So let's go over here, and you need to pass the sort parameter,
2:12
but it won't work if you put it over here, -S. It needs to be before the script. So we'll say '-S cumtime'. Try it again.
2:21
Okay, now let's see what we've got. A couple of built in imports, and notice we're working with research.py and program.py
2:28
so this is some module stuff, this is not a lot we can do. But this right here, research.py init, this is pretty interesting.
2:36
So this is actually the code that we call to read, basically parse the CSV. So if we look over here, this init is the thing
2:44
that actually does the loading, the parse row. Over here like this we can look for parse row, and there's that.
2:51
And we're spending about 300 - about 3 milliseconds on this, not super, super long, but we're calling a bunch of times.
3:01
Okay, so it turns out that this program's a little hard to understand because it's not doing that much. This is actually an easier job for complex,
3:10
involved applications I find a lot of times because it's pretty clear where it's spending time. This one, it's actually really quick.
3:17
But we're still going to analyze it, don't worry. I just want to sort of give you the sense that actually this, even though it's a simple example,
3:22
it's kind of hard to understand the performance. If you want to just run the whole thing and see how it works, here you go.
3:29
Just run cProfile, sort by something, give a domain script to run, off you go. This is one way, but notice when we did this
3:37
there's all sorts of stuff in here that's irrelevant to us. For example, initializing the typing module. I don't care, we can't control that,
3:46
that's just something we're doing to define some definitions. You could say 'don't use typing' and that's an option, but can you hide that?
3:54
Does importlib bootstrap find and load? These things, loading the module, we're spending significant time here. We can't control that.
4:03
So what we want to do is we want to measure the parts that we can really carefully work with and control.
4:08
So we're going to see how to do that using the API from within Python and we'll get better answers here.