#100DaysOfCode in Python Transcripts
Chapter: Days 49-51: Measuring performance
Lecture: Demo: Faster with less data processed
0:00 We saw this parse row is where we're spending most
0:02 of the time in the code that we wrote,
0:04 that we're actually interacting with.
0:07 We're also, just for completeness' sake,
0:09 taking every element of this file and converting it,
0:13 and storing it and working with it.
0:14 But what we've learned is that our program actually
0:16 only uses four fields; three of which we're converting.
0:20 Why do we need to convert all the others, right?
0:22 If we're never going to look at the average min temperature,
0:25 why do we need to convert it?
0:27 Now, this is not something you want to start with,
0:29 'cause this could cause all sorts of problems,
0:31 but once you know the data you're working with
0:33 and how you're going to use it,
0:35 you probably want to come along here and say,
0:36 well, actual mean temp, don't use that,
0:39 actual min and max, those we are,
0:41 these averages are out, these records are out,
0:45 we're down to average precipitation, and those.
0:49 So now we're down to just these three.
0:51 So this is going to be faster.
0:54 However, we're still creating this record
0:55 which stores 12 values,
0:57 and we're sort of passing them all along here.
0:59 Let's do a little bit better.
1:03 What do we want, we want a date, actual, not actual mean,
1:06 so we can not even put them into our data structure.
1:09 Take out actual mean, put our min, our max,
1:12 a bunch of average stuff we're not using,
1:14 actual precipitation,
1:18 and that's it.
1:19 So we have one, two, three, four, those are our four values.
1:22 Now this trick is not going to work anymore
1:24 because there's more data in the dictionary than it accepts.
1:27 Okay, so we got to go back
1:28 and do this a little more manual now.
1:31 So we're going to say row.get date,
1:34 and we'll just do this for each one.
1:39 A little bit more verbose,
1:40 but we're only storing the four values,
1:42 we're only getting them from the dictionary,
1:44 initializing them, all that kind of stuff.
1:47 Now let's just look really quickly here
1:49 at how long we spend on these two.
1:50 I'll put these into a comment right up here.
1:57 Alright let's run it and see if that makes any difference.
2:00 It's so fast, really, that you probably
2:02 wouldn't actually visually tell but like I said,
2:05 if you can take it down from 300,
2:07 what are we looking at here?
2:09 We're looking at quite a bit here, 750,
2:13 this is the one that probably matters.
2:15 350 milliseconds, that is a lot of time.
2:17 What if it's something interactive in real time,
2:19 like a webapp for example.
2:21 Let's try again.
2:23 Now look at this.
2:24 This part actually stepped up and got in the way of this.
2:27 It used to be those were next to each other, and why?
2:30 'Cause that got way, way better.
2:33 So let's go down here and print this out.
2:35 I'm going to delete this CSV row,
2:37 there's not a lot we can do about that for the moment.
2:41 Look at this; 350 to 159.
2:45 That's more than 50% reduction,
2:48 just by looking at the way we're creating
2:50 or reading our data, and working like this, right?
2:53 We don't need to load and parse that other data.
2:56 We could actually go and simplify our original data source,
2:58 really, but that probably doesn't make a lot of sense.
3:02 This is probably the way to do it.
3:04 So we used profiling to say,
3:06 well, this function is where we're spending so much time.
3:10 If we look at the other ones,
3:11 look at the part where our code is running,
3:13 this next and sorted, like this stuff we don't control,
3:16 these are the other important ones,
3:18 but they're like 20 milliseconds for 100,
3:22 so that's one fifth of one millisecond?
3:29 .21 milliseconds?
3:31 That's fast enough, alright?
3:32 We probably just don't care to optimize that any faster,
3:36 and you know, we look at that code,
3:41 we look at that code down here, like,
3:44 you could try some stuff to try to make it faster, right,
3:46 we could maybe store our data re-sorted
3:50 based on some condition, right,
3:52 like we pre-sort this on the max,
3:55 maybe it's less sorting for the min,
3:57 you know certainly this one would be crazy fast,
4:01 how much better can we make it, right?
4:03 If it's .2 milliseconds and we make it .18 milliseconds,
4:06 no one's going to know.
4:07 Especially when you look at the fact that there's
4:10 a total of 600 milliseconds in this whole experience,
4:15 so really, this is probably as good as it's going to get.
4:19 The other thing we can do, the final thing we can do,
4:21 and just notice that we're still spending
4:23 a very large portion,
4:26 five sixths out of that, whatever that is,
4:29 a very large portion of our time in this init function.
4:32 Because we happen to be calling it over and over.
4:35 So now that we've got it's individual run
4:37 at about as good as we're going to get,
4:39 let's just add one more thing.
4:45 Super simple, like, hey, have you already initialized it?
4:47 We're just going to keep the data,
4:48 it's unlikely to have changed since then.
4:51 Now we run it, and we get different answers still.
4:54 It's now down to these three that are the actual slow ones.
4:57 But like I said,
4:58 I don't think we can optimize that any faster.
5:03 Here's the research.init, and that's six milliseconds.
5:08 I don't think we can do better than that.
5:09 We're loading a pretty large text file and parsing it;
5:12 six milliseconds, we're going to be happy with that.
5:14 So you can see how we went through this process
5:16 of iteration with profiling,
5:18 to actually come to make our code much, much faster.
5:22 It used to take almost half a second,
5:24 now it takes 55 milliseconds.
5:27 And that's actually to call it,
5:29 how many times did we call it, 100 times,
5:31 is that what I said?
5:33 Yeah, so in the end we're sort of running the whole program
5:35 100 times and we've got it down to 55 milliseconds.
5:39 Less than one millisecond to run that whole analysis;
5:43 load that file, do that, and so on.
5:45 That's not quite right because
5:46 we're only technically loading parts of the file once,
5:48 and caching that, right,
5:51 but you can see how we go through this process
5:52 to really look at where our code is slow,
5:55 think about why it's slow,
5:57 and whether or not we can change it.
5:59 Sometimes we could, parse row,
6:01 other times, hot days, cold days, wet days,
6:03 we're kind of there, like,
6:04 there's not a whole lot more we can do.
6:06 If we want that to be faster,
6:07 maybe we have to pre-compute those and store them, like,
6:11 basically cache the result of that cold days list and so on.
6:15 But that's, that adds a bunch of complexity
6:17 and it's just not worth it here.