Effective PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning code
0:00 We're armed with the fact that compute analytics is the slowest thing and if we look
0:05 just a little further down the line,
0:07 we have learned, which is 3.9 or total 6.4% of the time and read data
0:13 which is 61% of the time.
0:16 Alright, so where should we focus?
0:18 Let's go over to the function and we've got read data then we've got learn and
0:25 read Yeah, this read data were doing twice actually,
0:28 so we're going to need to really work on that.
0:31 Let's go over here and jump in.
0:33 Now notice again this is a little bit contrived but we're doing some in Python processing
0:39 let's say of this, this code here,
0:42 we're simulating that with this sleep.
0:44 And it turns out that when you're doing lots of computational stuff in python,
0:49 there's usually some library implemented in C or Cython or something like that.
0:53 That's going to be way faster the world.
0:56 Working with lists of data here.
0:58 And what might make a lot more sense is to work with something like Numpy.
1:02 So let's imagine the switch, we've done some testing,
1:09 we switch over to the Numpy library which is written in C.
1:12 This has very thin wrappers exposed to python And we gain 20 times performance on this
1:19 processing of these arrays of numbers and things.
1:22 We're going to simulate that by saying,
1:24 you know what? We're no longer spending that much time that we're spending 1/20 or
1:29 divide the two by that and we get this much.
1:33 So that's how much time we've gained with this theoretical upgrade to Numpy,
1:37 I don't really want to bring Numpy into the situation here.
1:40 We could we could come up with something in python that actually gets 20 X but
1:45 it just adds complexity. So use your imagination here.
1:48 Right. Let's run it. See if it's any faster as the search.
1:51 It's the dB boom, wow,
1:53 that was a lot faster. Remember how much time we were spending over here and
1:56 compute analytics and read data? Three point basically 3.0 seconds.
2:01 Let's run the profiler again and see about it now.
2:07 All right. We could do a quick flip over like this and look check it
2:12 out. We got to go down a little bit.
2:14 All the way down here is where our computer analytics went.
2:16 So it's down to 473 milliseconds or 20%.
2:22 We look at it in a call graph,
2:23 which I really like to like to see it that way.
2:25 Let's go over here. It switched from orange and spending that much time.
2:32 three seconds from computer analytics to now.
2:35 Just 165 milliseconds and read data.
2:38 Let's imagine we can't do faster.
2:40 Right? We switched to Numpy.
2:41 We're doing the load. Boom,
2:43 that's it. That's as fast as we can go.
2:45 The other thing we could do over here is work on learn and this is actually
2:49 pretty interesting. Let's jump in and check this out.
2:53 Imagine this is the machine learning math that we're doing.
2:57 Of course we'd really use something like tensorflow but here's some math that we're doing and
3:01 imagine the math cannot change. We just have to do it.
3:05 Well let's go back here and look at this in a little bit more detail.
3:08 So learn, it turns out the thing that we're spending a lot of time in
3:12 actually is this 'math.pow' We're doing that,
3:16 wow something like 627,000 times, even though it only takes a little bit of time
3:21 right there. But calling it turns out to take a lot of time
3:24 I'm going to show you a really cool technique we can use to make that
3:28 faster. Let's do something that doesn't seem like it will be better we're going to
3:33 create a function that will call 'math.pow'.
3:36 So we'll say 'def compute.pow' and it's going to take an X and Y.
3:41 It's going to return math.pow of X and Y.
3:45 Okay and instead of doing this right here,
3:50 I'm gonna leave the commented one in here,
3:52 I'm gonna say compute_pow of IDD.
3:55 Seven not here,
3:58 we're going to do the same thing,
4:00 this is going to be compute pow like that.
4:10 Okay, if I run it,
4:11 chances are it's going to be slower because in addition to calling this a bunch of
4:15 times, we're also adding the overhead of calling another function.
4:18 Let's see though that we still get the same number.
4:22 We do. We get this and if we profile it over here and compare real
4:32 quick, it's important to compare as we go,
4:35 which one is this? This is the learn function.
4:38 So let's go look at the stats for learn 308 in the new one,
4:45 420 see. There was some overhead.
4:47 Can't make that better can we?
4:48 We shouldn't do this. Ah but we can check this out.
4:51 So it turns out that we have this IDD.
4:54 Pass along as we loop over this.
4:57 The I. D. D.
4:58 Is the same. So that's going to be repeated.
5:00 The seven is going to be repeated and some of the time these numbers will also
5:04 turn out to be the same if we had the same inputs raising a number to
5:10 the power is always going to give the same outputs.
5:12 So what we can do is use this really cool library called 'funk tools' but we
5:16 got to import funk tools. And on here there's a cache,
5:20 something called an 'lru_cache( )'
5:21 What is the lru cash do?
5:22 This is going to take a function and if you pass it
5:27 the same arguments more than once.
5:30 The first time it's going to compute the result.
5:32 But the second time in 3rd and 4th because I already saw those inputs,
5:36 this is always going to give the same answer.
5:38 Let's just return the pre computed saved value.
5:41 So we're going to trade a little bit of memory consumption for time.
5:45 Let's run this again. Make sure that we get the same number.
5:49 We do the same number. Hard to tell at this point.
5:52 We're getting down to the edges of whether it is faster,
5:54 but let's run it one more time.
6:00 All right. Let's let's see the final result here.
6:03 Go down here to learn and look at that.
6:07 Now it's 7.1%. Whereas before learned was 19%.
6:12 So 420. Ydown to 217.
6:17 So more than twice as fast.
6:19 How cool is that? And all we had to do is realize what kind of
6:23 doing this thing over and over.
6:24 It is always going to give us the same answer so we can put a cache
6:27 on that. So if we happen to see the same values,
6:30 we don't have to re compute it over and over.
6:32 Fantastic. All right, let's go back here to our final result and look at
6:36 the call graph and see where we are with regard to this machine learning.
6:40 But now we're in a good place with this computer analytics.
6:43 It was by far the slowest part of the entire program,
6:46 taking almost five seconds. And now we've gotten read data down nice and quick using
6:51 our simulated numpy and we've got our learn down a bunch times more than twice as
6:57 fast by using the 'lru cache'
6:59 And notice over here,
7:01 remember this was 600,000 times or something like that,
7:04 or calling it only half as many times,
7:06 and that's why it's twice as fast. Super cool right!!.