Effective PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning

Login or purchase this course to watch this video and the rest of the course contents.
0:01 There are actually two operations and they're both kind of slow read data and learn are both slow.
0:06 this one is processing some, this is all in our minds, this is processing some data
0:13 and it turns out that it's accessing a bunch of complex arrays and stuff in memory and it's just a little bit too slow.
0:20 So what if we were say to switch to NumPy and make this much faster, we could speed that up by a factor of 20, so change that, okay.
0:31 If we do that this read data part will get a lot faster. The other problem is this learn here so let's just do another profile real quick,
0:41 make sure our change was positive, go to our call graph, zoom in, move around a bit, here we are, this is actually much, much faster
0:53 it was spending a lot of time, I think it was about 4 seconds over here now it's 160 milliseconds, so yeey, NumPy,
1:03 you've got to imagine we actually did something there, but now this learn is also slow. Now suppose this is the machine learning algorithm
1:09 and we cannot change this math, it must go like this now there's always something that you can do differently,
1:15 but let's suppose this has to be this way, so maybe we could do our little trick, we could say functools.lru_cache like this
1:24 import that at the top, this is going to make it so much faster because it turns out we're actually giving it a lot of the same data,
1:31 bam, unhashable type list, so maybe we can't use lru, all right, so we could basically mirror what this is doing
1:42 and we could create a decorator that actually tries to model this but it is specifically built for hashing lists and these types of data.
1:53 Okay so how do we do that? Well, we're going to define a decorator, let's call it list_momento
2:05 and what it's going to take is it's going to take a function and it's going to later return a wrapper func
2:15 so the idea is, what we do is we create a function, it will be effectively past learn, we're going to define inside here a wrapper function
2:28 that at some point is going to return calling the function itself and this function, let's suppose it takes a bunch of lists,
2:42 I guess we could do it like this, it only accepts lists and it passes those lists on. We could do something like that, kind of, sort of, generally
2:54 and let's just run this and see if we still get the same answer. Hey look, we're getting numbers down at the bottom, great.
3:00 So it's working, that's pretty cool but what's going on here, we're not getting any faster, I don't think, unless I miswrote this,
3:18 yeah, look at this compute analytics, it's still in learn, it's doing its thing it's not any faster just because we wrapped it in this.
3:25 However, we are very, very close so the last thing we need to do is let's define over here, actually we can define it in here a cache,
3:35 and what we'd like to do is put into this dictionary say every time we see the results coming in, or the arguments coming in the same
3:44 we'd like to capture the results, send them out. So we can come up and define a key that we're going to use for this
3:51 and it's going to be maybe a comma separated what we can do is we can just take the string representation of the lists
4:01 and let's just print out the key really quick, so we can see what's going on, yeey, update, yeah, so we've got to convert that, all right,
4:12 so we'll say stir of l or l in that, alright. It's a little hard to see, so here we'll have key like this
4:28 and now if I run it you should see, there we go. Okay, so the key is just taking the two strings which is the set of descriptions and some data records
4:37 and it's going like this, but this is hashable, all right. So what we could do is we could actually either compute the hash or just store the key,
4:47 for now I'm just going to store the key, in reality you probably want to hash it
4:51 because this is tons of data, you really just need a hash, take the hash.
4:55 So this is cool, we'll say this, we'll say if key in cache return cache of key, that's the super fast version, but if we've never seen this input,
5:08 we're going to go over here and say cache key equals that and then we'll actually return this, so watch how fast this runs now,
5:25 bam, done, super, super fast, I'll update you later markdown. So over here, we're able to define this momento just like we had the lru cache,
5:36 but we built it around lists those are not hashable, ours exactly knows what to do with lists,
5:42 it converts them to strings and then hashes that, theoretically. So let's run this one more time, see how have we done, I think we might be finished.
5:50 All right, so much, much better come down here and do a quick zoom and look at this, we've got our time down, we were at like 12 seconds,
6:02 oh we still have one more to go, actually I'm not sure we can get much faster, this learn here we'll have to see,
6:10 this learn actually is quite a complicated algorithm like you can see 600 thousand raising numbers to power and stuff, so that's pretty insane,
6:19 but notice because of our lru thing, our list momento decorator we built, this is actually only called one time ever,
6:26 because it turns out we're sending in the same data most of the time, all the time. So the wrap function that we wrote,
6:33 that was called 9 times but it almost spent no time there, right, one time it has to call this, the rest it just uses dictionaries those fly.
6:43 So we've gotten this down significantly faster, it's 1 second, you might remember it was like one and a half seconds before
6:50 but we're calling it 10 times, 9 times as many times so we got it roughly let's say 10 or 11 times faster,
6:58 and we did that by looking specifically at the output from the profiler and finding the worst-case scenario, fixing that,
7:05 going to the what is the new worst-case but better than before scenario and just working our way through this program.


Talk Python's Mastodon Michael Kennedy's Mastodon