Effective PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
There are actually two operations and they're both kind of slow read data and learn are both slow.
0:06
this one is processing some, this is all in our minds, this is processing some data
0:13
and it turns out that it's accessing a bunch of complex arrays and stuff in memory and it's just a little bit too slow.
0:20
So what if we were say to switch to NumPy and make this much faster, we could speed that up by a factor of 20, so change that, okay.
0:31
If we do that this read data part will get a lot faster. The other problem is this learn here so let's just do another profile real quick,
0:41
make sure our change was positive, go to our call graph, zoom in, move around a bit, here we are, this is actually much, much faster
0:53
it was spending a lot of time, I think it was about 4 seconds over here now it's 160 milliseconds, so yeey, NumPy,
1:03
you've got to imagine we actually did something there, but now this learn is also slow. Now suppose this is the machine learning algorithm
1:09
and we cannot change this math, it must go like this now there's always something that you can do differently,
1:15
but let's suppose this has to be this way, so maybe we could do our little trick, we could say functools.lru_cache like this
1:24
import that at the top, this is going to make it so much faster because it turns out we're actually giving it a lot of the same data,
1:31
bam, unhashable type list, so maybe we can't use lru, all right, so we could basically mirror what this is doing
1:42
and we could create a decorator that actually tries to model this but it is specifically built for hashing lists and these types of data.
1:53
Okay so how do we do that? Well, we're going to define a decorator, let's call it list_momento
2:05
and what it's going to take is it's going to take a function and it's going to later return a wrapper func
2:15
so the idea is, what we do is we create a function, it will be effectively past learn, we're going to define inside here a wrapper function
2:28
that at some point is going to return calling the function itself and this function, let's suppose it takes a bunch of lists,
2:42
I guess we could do it like this, it only accepts lists and it passes those lists on. We could do something like that, kind of, sort of, generally
2:54
and let's just run this and see if we still get the same answer. Hey look, we're getting numbers down at the bottom, great.
3:00
So it's working, that's pretty cool but what's going on here, we're not getting any faster, I don't think, unless I miswrote this,
3:18
yeah, look at this compute analytics, it's still in learn, it's doing its thing it's not any faster just because we wrapped it in this.
3:25
However, we are very, very close so the last thing we need to do is let's define over here, actually we can define it in here a cache,
3:35
and what we'd like to do is put into this dictionary say every time we see the results coming in, or the arguments coming in the same
3:44
we'd like to capture the results, send them out. So we can come up and define a key that we're going to use for this
3:51
and it's going to be maybe a comma separated what we can do is we can just take the string representation of the lists
4:01
and let's just print out the key really quick, so we can see what's going on, yeey, update, yeah, so we've got to convert that, all right,
4:12
so we'll say stir of l or l in that, alright. It's a little hard to see, so here we'll have key like this
4:28
and now if I run it you should see, there we go. Okay, so the key is just taking the two strings which is the set of descriptions and some data records
4:37
and it's going like this, but this is hashable, all right. So what we could do is we could actually either compute the hash or just store the key,
4:47
for now I'm just going to store the key, in reality you probably want to hash it
4:51
because this is tons of data, you really just need a hash, take the hash.
4:55
So this is cool, we'll say this, we'll say if key in cache return cache of key, that's the super fast version, but if we've never seen this input,
5:08
we're going to go over here and say cache key equals that and then we'll actually return this, so watch how fast this runs now,
5:25
bam, done, super, super fast, I'll update you later markdown. So over here, we're able to define this momento just like we had the lru cache,
5:36
but we built it around lists those are not hashable, ours exactly knows what to do with lists,
5:42
it converts them to strings and then hashes that, theoretically. So let's run this one more time, see how have we done, I think we might be finished.
5:50
All right, so much, much better come down here and do a quick zoom and look at this, we've got our time down, we were at like 12 seconds,
6:02
oh we still have one more to go, actually I'm not sure we can get much faster, this learn here we'll have to see,
6:10
this learn actually is quite a complicated algorithm like you can see 600 thousand raising numbers to power and stuff, so that's pretty insane,
6:19
but notice because of our lru thing, our list momento decorator we built, this is actually only called one time ever,
6:26
because it turns out we're sending in the same data most of the time, all the time. So the wrap function that we wrote,
6:33
that was called 9 times but it almost spent no time there, right, one time it has to call this, the rest it just uses dictionaries those fly.
6:43
So we've gotten this down significantly faster, it's 1 second, you might remember it was like one and a half seconds before
6:50
but we're calling it 10 times, 9 times as many times so we got it roughly let's say 10 or 11 times faster,
6:58
and we did that by looking specifically at the output from the profiler and finding the worst-case scenario, fixing that,
7:05
going to the what is the new worst-case but better than before scenario and just working our way through this program.