Mastering PyCharm Transcripts
Chapter: Performance and profiling
Lecture: Optimizing the machine learning

Login or purchase this course to watch this video and the rest of the course contents.
0:01 There are actually two operations and they're both kind of slow
0:03 read data and learn are both slow.
0:05 this one is processing some, this is all in our minds,
0:09 this is processing some data
0:12 and it turns out that it's accessing a bunch of complex arrays and stuff in memory
0:17 and it's just a little bit too slow.
0:19 So what if we were say to switch to NumPy and make this much faster,
0:24 we could speed that up by a factor of 20, so change that, okay.
0:30 If we do that this read data part will get a lot faster.
0:34 The other problem is this learn here
0:38 so let's just do another profile real quick,
0:40 make sure our change was positive,
0:45 go to our call graph, zoom in, move around a bit,
0:49 here we are, this is actually much, much faster
0:52 it was spending a lot of time, I think it was about 4 seconds over here
0:58 now it's 160 milliseconds, so yeey, NumPy,
1:02 you've got to imagine we actually did something there,
1:04 but now this learn is also slow.
1:06 Now suppose this is the machine learning algorithm
1:08 and we cannot change this math, it must go like this
1:11 now there's always something that you can do differently,
1:14 but let's suppose this has to be this way,
1:17 so maybe we could do our little trick, we could say functools.lru_cache like this
1:23 import that at the top, this is going to make it so much faster
1:27 because it turns out we're actually giving it a lot of the same data,
1:30 bam, unhashable type list,
1:35 so maybe we can't use lru,
1:38 all right, so we could basically mirror what this is doing
1:41 and we could create a decorator that actually tries to model this
1:47 but it is specifically built for hashing lists and these types of data.
1:52 Okay so how do we do that?
1:55 Well, we're going to define a decorator, let's call it list_momento
2:04 and what it's going to take is it's going to take a function
2:08 and it's going to later return a wrapper func
2:14 so the idea is, what we do is we create a function,
2:18 it will be effectively past learn,
2:23 we're going to define inside here a wrapper function
2:27 that at some point is going to return calling the function itself
2:34 and this function, let's suppose it takes a bunch of lists,
2:41 I guess we could do it like this,
2:46 it only accepts lists and it passes those lists on.
2:49 We could do something like that, kind of, sort of, generally
2:53 and let's just run this and see if we still get the same answer.
2:56 Hey look, we're getting numbers down at the bottom, great.
2:59 So it's working, that's pretty cool
3:05 but what's going on here,
3:08 we're not getting any faster, I don't think, unless I miswrote this,
3:17 yeah, look at this compute analytics, it's still in learn, it's doing its thing
3:21 it's not any faster just because we wrapped it in this.
3:24 However, we are very, very close
3:28 so the last thing we need to do is let's define over here,
3:31 actually we can define it in here a cache,
3:34 and what we'd like to do is put into this dictionary
3:37 say every time we see the results coming in,
3:41 or the arguments coming in the same
3:43 we'd like to capture the results, send them out.
3:46 So we can come up and define a key that we're going to use for this
3:50 and it's going to be maybe a comma separated
3:55 what we can do is we can just take the string representation of the lists
4:00 and let's just print out the key really quick,
4:03 so we can see what's going on, yeey, update,
4:05 yeah, so we've got to convert that, all right,
4:11 so we'll say stir of l or l in that, alright.
4:24 It's a little hard to see, so here we'll have key like this
4:27 and now if I run it you should see, there we go.
4:30 Okay, so the key is just taking the two strings
4:33 which is the set of descriptions and some data records
4:36 and it's going like this, but this is hashable, all right.
4:40 So what we could do is we could actually
4:42 either compute the hash or just store the key,
4:46 for now I'm just going to store the key,
4:48 in reality you probably want to hash it
4:50 because this is tons of data, you really just need a hash, take the hash.
4:54 So this is cool, we'll say this, we'll say if key in cache return cache of key,
5:01 that's the super fast version,
5:05 but if we've never seen this input,
5:07 we're going to go over here and say cache key equals that
5:19 and then we'll actually return this,
5:21 so watch how fast this runs now,
5:24 bam, done, super, super fast, I'll update you later markdown.
5:30 So over here, we're able to define this momento just like we had the lru cache,
5:35 but we built it around lists those are not hashable,
5:38 ours exactly knows what to do with lists,
5:41 it converts them to strings and then hashes that, theoretically.
5:44 So let's run this one more time, see how have we done,
5:46 I think we might be finished.
5:49 All right, so much, much better
5:51 come down here and do a quick zoom and look at this,
5:55 we've got our time down, we were at like 12 seconds,
6:01 oh we still have one more to go,
6:04 actually I'm not sure we can get much faster,
6:06 this learn here we'll have to see,
6:09 this learn actually is quite a complicated algorithm like you can see
6:12 600 thousand raising numbers to power and stuff, so that's pretty insane,
6:18 but notice because of our lru thing, our list momento decorator we built,
6:23 this is actually only called one time ever,
6:25 because it turns out we're sending in the same data most of the time, all the time.
6:29 So the wrap function that we wrote,
6:32 that was called 9 times but it almost spent no time there, right,
6:38 one time it has to call this, the rest it just uses dictionaries those fly.
6:42 So we've gotten this down significantly faster,
6:45 it's 1 second, you might remember it was like one and a half seconds before
6:49 but we're calling it 10 times, 9 times as many times
6:52 so we got it roughly let's say 10 or 11 times faster,
6:57 and we did that by looking specifically at the output from the profiler
7:01 and finding the worst-case scenario, fixing that,
7:04 going to the what is the new worst-case but better than before scenario
7:08 and just working our way through this program.