#
Effective PyCharm Transcripts

Chapter: Performance and profiling

Lecture: Optimizing the machine learning code

Login or
purchase this course
to watch this video and the rest of the course contents.

0:00
We're armed with the fact that compute analytics is the slowest thing and if we look

0:05
just a little further down the line,

0:07
we have learned, which is 3.9 or total 6.4% of the time and read data

0:13
which is 61% of the time.

0:16
Alright, so where should we focus?

0:18
Let's go over to the function and we've got read data then we've got learn and

0:25
read Yeah, this read data were doing twice actually,

0:28
so we're going to need to really work on that.

0:31
Let's go over here and jump in.

0:33
Now notice again this is a little bit contrived but we're doing some in Python processing

0:39
let's say of this, this code here,

0:42
we're simulating that with this sleep.

0:44
And it turns out that when you're doing lots of computational stuff in python,

0:49
there's usually some library implemented in C or Cython or something like that.

0:53
That's going to be way faster the world.

0:56
Working with lists of data here.

0:58
And what might make a lot more sense is to work with something like Numpy.

1:02
So let's imagine the switch, we've done some testing,

1:09
we switch over to the Numpy library which is written in C.

1:12
This has very thin wrappers exposed to python And we gain 20 times performance on this

1:19
processing of these arrays of numbers and things.

1:22
We're going to simulate that by saying,

1:24
you know what? We're no longer spending that much time that we're spending 1/20 or

1:29
divide the two by that and we get this much.

1:33
So that's how much time we've gained with this theoretical upgrade to Numpy,

1:37
I don't really want to bring Numpy into the situation here.

1:40
We could we could come up with something in python that actually gets 20 X but

1:45
it just adds complexity. So use your imagination here.

1:48
Right. Let's run it. See if it's any faster as the search.

1:51
It's the dB boom, wow,

1:53
that was a lot faster. Remember how much time we were spending over here and

1:56
compute analytics and read data? Three point basically 3.0 seconds.

2:01
Let's run the profiler again and see about it now.

2:07
All right. We could do a quick flip over like this and look check it

2:12
out. We got to go down a little bit.

2:14
All the way down here is where our computer analytics went.

2:16
So it's down to 473 milliseconds or 20%.

2:22
We look at it in a call graph,

2:23
which I really like to like to see it that way.

2:25
Let's go over here. It switched from orange and spending that much time.

2:32
three seconds from computer analytics to now.

2:35
Just 165 milliseconds and read data.

2:38
Let's imagine we can't do faster.

2:40
Right? We switched to Numpy.

2:41
We're doing the load. Boom,

2:43
that's it. That's as fast as we can go.

2:45
The other thing we could do over here is work on learn and this is actually

2:49
pretty interesting. Let's jump in and check this out.

2:53
Imagine this is the machine learning math that we're doing.

2:57
Of course we'd really use something like tensorflow but here's some math that we're doing and

3:01
imagine the math cannot change. We just have to do it.

3:05
Well let's go back here and look at this in a little bit more detail.

3:08
So learn, it turns out the thing that we're spending a lot of time in

3:12
actually is this 'math.pow' We're doing that,

3:16
wow something like 627,000 times, even though it only takes a little bit of time

3:21
right there. But calling it turns out to take a lot of time

3:24
I'm going to show you a really cool technique we can use to make that

3:28
faster. Let's do something that doesn't seem like it will be better we're going to

3:33
create a function that will call 'math.pow'.

3:36
So we'll say 'def compute.pow' and it's going to take an X and Y.

3:41
It's going to return math.pow of X and Y.

3:45
Okay and instead of doing this right here,

3:50
I'm gonna leave the commented one in here,

3:52
I'm gonna say compute_pow of IDD.

3:55
Seven not here,

3:58
we're going to do the same thing,

4:00
this is going to be compute pow like that.

4:10
Okay, if I run it,

4:11
chances are it's going to be slower because in addition to calling this a bunch of

4:15
times, we're also adding the overhead of calling another function.

4:18
Let's see though that we still get the same number.

4:22
We do. We get this and if we profile it over here and compare real

4:32
quick, it's important to compare as we go,

4:35
which one is this? This is the learn function.

4:38
So let's go look at the stats for learn 308 in the new one,

4:45
420 see. There was some overhead.

4:47
Can't make that better can we?

4:48
We shouldn't do this. Ah but we can check this out.

4:51
So it turns out that we have this IDD.

4:54
Pass along as we loop over this.

4:57
The I. D. D.

4:58
Is the same. So that's going to be repeated.

5:00
The seven is going to be repeated and some of the time these numbers will also

5:04
turn out to be the same if we had the same inputs raising a number to

5:10
the power is always going to give the same outputs.

5:12
So what we can do is use this really cool library called 'funk tools' but we

5:16
got to import funk tools. And on here there's a cache,

5:20
something called an 'lru_cache( )'

5:21
What is the lru cash do?

5:22
This is going to take a function and if you pass it

5:27
the same arguments more than once.

5:30
The first time it's going to compute the result.

5:32
But the second time in 3rd and 4th because I already saw those inputs,

5:36
this is always going to give the same answer.

5:38
Let's just return the pre computed saved value.

5:41
So we're going to trade a little bit of memory consumption for time.

5:45
Let's run this again. Make sure that we get the same number.

5:49
We do the same number. Hard to tell at this point.

5:52
We're getting down to the edges of whether it is faster,

5:54
but let's run it one more time.

6:00
All right. Let's let's see the final result here.

6:03
Go down here to learn and look at that.

6:07
Now it's 7.1%. Whereas before learned was 19%.

6:12
So 420. Ydown to 217.

6:17
So more than twice as fast.

6:19
How cool is that? And all we had to do is realize what kind of

6:23
doing this thing over and over.

6:24
It is always going to give us the same answer so we can put a cache

6:27
on that. So if we happen to see the same values,

6:30
we don't have to re compute it over and over.

6:32
Fantastic. All right, let's go back here to our final result and look at

6:36
the call graph and see where we are with regard to this machine learning.

6:40
But now we're in a good place with this computer analytics.

6:43
It was by far the slowest part of the entire program,

6:46
taking almost five seconds. And now we've gotten read data down nice and quick using

6:51
our simulated numpy and we've got our learn down a bunch times more than twice as

6:57
fast by using the 'lru cache'

6:59
And notice over here,

7:01
remember this was 600,000 times or something like that,

7:04
or calling it only half as many times,

7:06
and that's why it's twice as fast. Super cool right!!.