#100DaysOfCode in Python Transcripts
Chapter: Days 49-51: Measuring performance
Lecture: Demo: Getting started
0:00 Let's take a practical example
0:02 and see how profiling can help us
0:03 understand this performance.
0:06 We previously talked about exploring CSV data
0:10 earlier in the course,
0:11 so we're going to take that exact same code
0:14 and we're going to try to understand it,
0:16 and in fact tweak it a little bit,
0:17 based on what we see in the performance.
0:20 So, let's pull this up here.
0:22 We actually have in our demo code over here,
0:25 under 'Days 49-51' we have two copies,
0:29 and right now they're the same,
0:31 of course the final code'll be changed,
0:32 the starter code is exactly what you're going to see
0:34 which we start with.
0:35 So you can play around with this data over here,
0:37 this is more or less just the end product
0:39 from the CSV section.
0:41 Over here we're going to work on this
0:43 and we're going to try to understand its performance.
0:46 We're going to actually step outside of PyCharm here,
0:50 let me just copy the path to this file,
0:52 there's a couple things we'll need to do.
0:56 And first thing we want to activate our virtual environment.
0:59 What we want to do is we want to use a built in module
1:02 and we can understand the entire program's behavior
1:06 from the outside,
1:07 we don't have to write any code to do this.
1:08 So what we want to do is we want to run cProfile,
1:12 and we want to tell it to sort,
1:14 we'll talk about sorting in a minute,
1:15 we want to run that against program.py.
1:17 How's it going to work?
1:18 Poorly, because it's not a separate program,
1:20 it's just a module in Python so what we need to do
1:23 is say python -m to run the module.
1:26 cProfile, capitalization here, capital 'P' matters.
1:29 Now we're going to give it this, and let's see if that works.
1:32 Okay, great, we got a bunch of gibberish-looking stuff here,
1:35 a lot of things going on about frozen imports
1:37 and all sorts of things,
1:38 and it turns out this is not how we want to look at our code.
1:41 I don't know how it's sorting it but it's not the right way.
1:44 We would like to sort by cumulative time.
1:46 There's basically two things that
1:47 you probably care about here.
1:49 One is per call, which is how much time
1:51 is each one individually spending,
1:54 and I think this is sort of the same thing,
1:56 like how much time is just in this function.
1:58 Not functions it calls, or above,
2:00 but like summed up across the number of calls.
2:04 But I find that by far the most useful one
2:06 is this cumtime, cumulative time.
2:08 So let's go over here,
2:09 and you need to pass the sort parameter,
2:11 but it won't work if you put it over here, -S.
2:15 It needs to be before the script.
2:16 So we'll say '-S cumtime'.
2:18 Try it again.
2:20 Okay, now let's see what we've got.
2:21 A couple of built in imports,
2:23 and notice we're working with research.py and program.py
2:27 so this is some module stuff, this is not a lot we can do.
2:30 But this right here, research.py init,
2:33 this is pretty interesting.
2:35 So this is actually the code that we call to read,
2:38 basically parse the CSV.
2:40 So if we look over here, this init is the thing
2:43 that actually does the loading, the parse row.
2:46 Over here like this we can look for parse row,
2:49 and there's that.
2:50 And we're spending about 300 - about 3 milliseconds
2:55 on this, not super, super long,
2:57 but we're calling a bunch of times.
3:00 Okay, so it turns out that this program's a little hard
3:03 to understand because it's not doing that much.
3:06 This is actually an easier job for complex,
3:09 involved applications I find a lot of times
3:11 because it's pretty clear where it's spending time.
3:15 This one, it's actually really quick.
3:16 But we're still going to analyze it, don't worry.
3:17 I just want to sort of give you the sense that actually this,
3:20 even though it's a simple example,
3:21 it's kind of hard to understand the performance.
3:25 If you want to just run the whole thing
3:26 and see how it works, here you go.
3:28 Just run cProfile, sort by something,
3:30 give a domain script to run, off you go.
3:33 This is one way, but notice when we did this
3:36 there's all sorts of stuff in here that's irrelevant to us.
3:40 For example, initializing the typing module.
3:43 I don't care, we can't control that,
3:45 that's just something we're doing
3:46 to define some definitions.
3:48 You could say 'don't use typing' and that's an option,
3:50 but can you hide that?
3:53 Does importlib bootstrap find and load?
3:56 These things, loading the module, we're spending
3:58 significant time here.
4:01 We can't control that.
4:02 So what we want to do is we want to measure the parts
4:04 that we can really carefully work with and control.
4:07 So we're going to see how to do that using the API from
4:10 within Python and we'll get better answers here.