Effective PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning code

0:00 We're armed with the fact that compute analytics is the slowest thing and if we look just a little further down the line,

0:08 we have learned, which is 3.9 or total 6.4% of the time and read data which is 61% of the time. Alright, so where should we focus?

0:19 Let's go over to the function and we've got read data then we've got learn and read Yeah, this read data were doing twice actually,

0:29 so we're going to need to really work on that. Let's go over here and jump in.

0:34 Now notice again this is a little bit contrived but we're doing some in Python processing let's say of this, this code here,

0:43 we're simulating that with this sleep. And it turns out that when you're doing lots of computational stuff in Python,

0:50 there's usually some library implemented in C or Cython or something like that. That's going to be way faster the world.

0:57 Working with lists of data here. And what might make a lot more sense is to work with something like Numpy.

1:03 So let's imagine the switch, we've done some testing, we switch over to the Numpy library which is written in C.

1:13 This has very thin wrappers exposed to Python And we gain 20 times performance on this processing of these arrays of numbers and things.

1:23 We're going to simulate that by saying, you know what? We're no longer spending that much time that we're spending 1/20 or

1:30 divide the two by that and we get this much. So that's how much time we've gained with this theoretical upgrade to Numpy,

1:38 I don't really want to bring Numpy into the situation here. We could we could come up with something in Python that actually gets 20 X but

1:46 it just adds complexity. So use your imagination here. Right. Let's run it. See if it's any faster as the search. It's the dB boom, wow,

1:54 that was a lot faster. Remember how much time we were spending over here and compute analytics and read data? Three point basically 3.0 seconds.

2:02 Let's run the profiler again and see about it now. All right. We could do a quick flip over like this and look check it

2:13 out. We got to go down a little bit. All the way down here is where our computer analytics went. So it's down to 473 milliseconds or 20%.

2:23 We look at it in a call graph, which I really like to like to see it that way. Let's go over here. It switched from orange and spending that much time.

2:33 three seconds from computer analytics to now. Just 165 milliseconds and read data. Let's imagine we can't do faster. Right? We switched to Numpy.

2:42 We're doing the load. Boom, that's it. That's as fast as we can go. The other thing we could do over here is work on learn and this is actually

2:50 pretty interesting. Let's jump in and check this out. Imagine this is the machine learning math that we're doing.

2:58 Of course we'd really use something like tensorflow but here's some math that we're doing and imagine the math cannot change. We just have to do it.

3:06 Well let's go back here and look at this in a little bit more detail. So learn, it turns out the thing that we're spending a lot of time in

3:13 actually is this 'math.pow' We're doing that, wow something like 627,000 times, even though it only takes a little bit of time

3:22 right there. But calling it turns out to take a lot of time I'm going to show you a really cool technique we can use to make that

3:29 faster. Let's do something that doesn't seem like it will be better we're going to create a function that will call 'math.pow'.

3:37 So we'll say 'def compute.pow' and it's going to take an X and Y. It's going to return math.pow of X and Y. Okay and instead of doing this right here,

3:51 I'm gonna leave the commented one in here, I'm gonna say compute_pow of IDD. Seven not here, we're going to do the same thing,

4:01 this is going to be compute pow like that. Okay, if I run it, chances are it's going to be slower because in addition to calling this a bunch of

4:16 times, we're also adding the overhead of calling another function. Let's see though that we still get the same number.

4:23 We do. We get this and if we profile it over here and compare real quick, it's important to compare as we go,

4:36 which one is this? This is the learn function. So let's go look at the stats for learn 308 in the new one, 420 see. There was some overhead.

4:48 Can't make that better can we? We shouldn't do this. Ah but we can check this out. So it turns out that we have this IDD.

4:55 Pass along as we loop over this. The I. D. D. Is the same. So that's going to be repeated.

5:01 The seven is going to be repeated and some of the time these numbers will also turn out to be the same if we had the same inputs raising a number to

5:11 the power is always going to give the same outputs. So what we can do is use this really cool library called 'funk tools' but we

5:17 got to import funk tools. And on here there's a cache, something called an 'lru_cache( )' What is the lru cash do?

5:23 This is going to take a function and if you pass it the same arguments more than once. The first time it's going to compute the result.

5:33 But the second time in 3rd and 4th because I already saw those inputs, this is always going to give the same answer.

5:39 Let's just return the pre computed saved value. So we're going to trade a little bit of memory consumption for time.

5:46 Let's run this again. Make sure that we get the same number. We do the same number. Hard to tell at this point.

5:53 We're getting down to the edges of whether it is faster, but let's run it one more time. All right. Let's let's see the final result here.

6:04 Go down here to learn and look at that. Now it's 7.1%. Whereas before learned was 19%. So 420. Ydown to 217. So more than twice as fast.

6:20 How cool is that? And all we had to do is realize what kind of doing this thing over and over.

6:25 It is always going to give us the same answer so we can put a cache on that. So if we happen to see the same values,

6:31 we don't have to re compute it over and over. Fantastic. All right, let's go back here to our final result and look at

6:37 the call graph and see where we are with regard to this machine learning. But now we're in a good place with this computer analytics.

6:44 It was by far the slowest part of the entire program, taking almost five seconds. And now we've gotten read data down nice and quick using

6:52 our simulated numpy and we've got our learn down a bunch times more than twice as fast by using the 'lru cache' And notice over here,

7:02 remember this was 600,000 times or something like that, or calling it only half as many times, and that's why it's twice as fast. Super cool right!!.

Effective PyCharm Transcripts Chapter: Performance and profiling Lecture: Optimizing the machine learning code

Effective PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning code