#100DaysOfCode in Python Transcripts
Chapter: Days 49-51: Measuring performance
Lecture: Demo: Faster with less data processed
0:00 We saw this parse row is where we're spending most of the time in the code that we wrote, that we're actually interacting with.
0:08 We're also, just for completeness' sake, taking every element of this file and converting it, and storing it and working with it.
0:15 But what we've learned is that our program actually only uses four fields; three of which we're converting.
0:21 Why do we need to convert all the others, right? If we're never going to look at the average min temperature, why do we need to convert it?
0:28 Now, this is not something you want to start with, 'cause this could cause all sorts of problems, but once you know the data you're working with
0:34 and how you're going to use it, you probably want to come along here and say, well, actual mean temp, don't use that, actual min and max, those we are,
0:42 these averages are out, these records are out, we're down to average precipitation, and those. So now we're down to just these three.
0:52 So this is going to be faster. However, we're still creating this record which stores 12 values, and we're sort of passing them all along here.
1:00 Let's do a little bit better. What do we want, we want a date, actual, not actual mean, so we can not even put them into our data structure.
1:10 Take out actual mean, put our min, our max, a bunch of average stuff we're not using, actual precipitation, and that's it.
1:20 So we have one, two, three, four, those are our four values. Now this trick is not going to work anymore
1:25 because there's more data in the dictionary than it accepts. Okay, so we got to go back and do this a little more manual now.
1:32 So we're going to say row.get date, and we'll just do this for each one. Okay. A little bit more verbose, but we're only storing the four values,
1:43 we're only getting them from the dictionary, initializing them, all that kind of stuff. Now let's just look really quickly here
1:50 at how long we spend on these two. I'll put these into a comment right up here. Alright let's run it and see if that makes any difference.
2:01 It's so fast, really, that you probably wouldn't actually visually tell but like I said, if you can take it down from 300, what are we looking at here?
2:10 We're looking at quite a bit here, 750, this is the one that probably matters. 350 milliseconds, that is a lot of time.
2:18 What if it's something interactive in real time, like a webapp for example. Let's try again. Now look at this.
2:25 This part actually stepped up and got in the way of this. It used to be those were next to each other, and why? 'Cause that got way, way better.
2:34 So let's go down here and print this out. I'm going to delete this CSV row, there's not a lot we can do about that for the moment.
2:42 Look at this; 350 to 159. That's more than 50% reduction, just by looking at the way we're creating or reading our data, and working like this, right?
2:54 We don't need to load and parse that other data. We could actually go and simplify our original data source,
2:59 really, but that probably doesn't make a lot of sense. This is probably the way to do it. So we used profiling to say,
3:07 well, this function is where we're spending so much time. If we look at the other ones, look at the part where our code is running,
3:14 this next and sorted, like this stuff we don't control, these are the other important ones, but they're like 20 milliseconds for 100,
3:23 so that's one fifth of one millisecond? .21? .21 milliseconds? That's fast enough, alright? We probably just don't care to optimize that any faster,
3:37 and you know, we look at that code, we look at that code down here, like, you could try some stuff to try to make it faster, right,
3:47 we could maybe store our data re-sorted based on some condition, right, like we pre-sort this on the max, maybe it's less sorting for the min,
3:58 you know certainly this one would be crazy fast, how much better can we make it, right? If it's .2 milliseconds and we make it .18 milliseconds,
4:07 no one's going to know. Especially when you look at the fact that there's a total of 600 milliseconds in this whole experience,
4:16 so really, this is probably as good as it's going to get. The other thing we can do, the final thing we can do,
4:22 and just notice that we're still spending a very large portion, five sixths out of that, whatever that is,
4:30 a very large portion of our time in this init function. Because we happen to be calling it over and over. So now that we've got it's individual run
4:38 at about as good as we're going to get, let's just add one more thing. Super simple, like, hey, have you already initialized it?
4:48 We're just going to keep the data, it's unlikely to have changed since then. Now we run it, and we get different answers still.
4:55 It's now down to these three that are the actual slow ones. But like I said, I don't think we can optimize that any faster.
5:04 Here's the research.init, and that's six milliseconds. I don't think we can do better than that. We're loading a pretty large text file and parsing it;
5:13 six milliseconds, we're going to be happy with that. So you can see how we went through this process of iteration with profiling,
5:19 to actually come to make our code much, much faster. It used to take almost half a second, now it takes 55 milliseconds.
5:28 And that's actually to call it, how many times did we call it, 100 times, is that what I said?
5:34 Yeah, so in the end we're sort of running the whole program 100 times and we've got it down to 55 milliseconds.
5:40 Less than one millisecond to run that whole analysis; load that file, do that, and so on. That's not quite right because
5:47 we're only technically loading parts of the file once, and caching that, right, but you can see how we go through this process
5:53 to really look at where our code is slow, think about why it's slow, and whether or not we can change it. Sometimes we could, parse row,
6:02 other times, hot days, cold days, wet days, we're kind of there, like, there's not a whole lot more we can do. If we want that to be faster,
6:08 maybe we have to pre-compute those and store them, like, basically cache the result of that cold days list and so on.
6:16 But that's, that adds a bunch of complexity and it's just not worth it here.