Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 8: File Searcher App
Lecture: Generators save the day

0:00 We've seen this concept of generator methods let's go apply it to everything we have going on here

0:06 and I'll even show you another keyword we haven't had a chance to talk about yet. So let's start at the bottom.

0:10 Here is a method that is a traditional method it puts all the stuff into the list and then once that computation is done

0:16 everything is computed it returns that whole list and then we just have another list above that we had even more to

0:22 and keep extending as we have more and more files. We can do better. So, instead of doing matches here, let's get rid of this,

0:30 and instead of doing append m we will say yield m and we won't have to return the matches maybe I'll even comment it out for you, like so,

0:41 so no more lists, we are just doing yield, now this would already work, what we are doing is calling up here search file and here we get our generator,

0:51 but we can take any collection generators or iterable collections and extend this list, but we can actually do better still

1:01 so this is a generator method, this is a regular one, but we can also apply the exactly same idea here and the same idea here,

1:08 now this gets a little tricky because I have to say for m in matches: yield m, now, that's not the most fun thing to write,

1:17 it would work but I'll show you something better, same thing down here for all the matches there we want to do that,

1:22 and then we no longer have our return so here is the generator method and this is going to come through

1:27 and each time that we sort go pull something out of this collection, it's going to go until it hits one of these,

1:34 which the generator and it's going to hand one back so if we only wanted the first 4 matches we could compute that extremely quickly.

1:41 However, this line 65, 66 this is not the coolest thing, it turns out that Python 3.3 added basically a keyword that will do the same thing

1:52 like take a whole set and sort of hand them back one at a time, and so we can simplify this and just say yield from matches,

1:58 and if we really wanted to simplify this we could actually come down here and just write it as one line, we could just say yield from that,

2:06 never even store matches here, similarly yield from that; so down below, we have search files,

2:13 that individual searching of a single file is a generator and we only ever have a single line in memory at a time.

2:21 Now up here, as we work through all the files in our directory or even recurse into a tree of directories and their files,

2:28 we are only pulling back one item from either here or here at a given time, and that means we only have 1 line in memory at a time,

2:38 really one search result and then we can go up here and we are printing out now, let me just show you that this is still working,

2:45 let's bring this back and then let me search the simple files again, just to show that we are actually still searching just like before.

2:54 So let's search the small set of books for Holmes, there you go, you can see 468 matches, and we are searching the Ulysses

3:03 or searching The Adventures of Sherlock Holmes, perfect, it works exactly the same but from a performance perspective it's not the same,

3:13 let's run it again, and this time we are going to search the large set of files,

3:18 and again, we are going to search for how many question marks are there, there were something like 2.78 million question marks and remember,

3:26 we had to use like almost 400 MB of memory to answer that question, remember, 400 MB, what's it going to do this time, can we do better?

3:36 Oh, here, hold on, let me stop this really quick, remember, we didn't do the output, it was too much going really,

3:42 we just did a little count, so let's rerun it this way. Ok, here we go again,

3:48 it's going, 3.8 MB remember, it should have jumped up to 300 MB, a gigabyte, what is going on, this is so absolutely amazing,

3:58 look at this you guys, we are processing gigabytes and gigabytes of code with almost identical algorithms and yet the memory usage is the same

4:06 as if we are processing like a single line in memory, because that's all we are ever holding, is a single line in memory, ok,

4:14 great we do have the file stream open to some huge file at some point, but we are seeking over, we are streaming across it.

4:21 Let's just let it run and see where it goes. It's done, look at that, look at the memory usage, look at the CPU,

4:35 look at the performance, it is so much better than it was before, in fact, I kept the previous one around,

4:43 let's have a look at it, it's not really fair to put them side by side, because the scale of the graph is not the same,

4:50 but I think we'll get the sense anyway. So on the left is the old bad sort of standard procedural code style and now look at the memory,

4:59 it goes from 3 MB when I was starting out to 394 MB; ours went from 3 MB to 4 MB. And that was it.

5:12 If you look at the size of the CPU graph or sort of the length of any of these graphs, you'll see they are basically identical in computational time,

5:19 it looks actually lower on CPU usage, presumably it's doing less garbage collection less allocation,

5:27 doubling of lists and copying them and things like this, and all we have done to change that algorithm is use the yield

5:34 and yield from keyword instead of making lists appending and extending them. The code we wrote actually got a couple of lines shorter,

5:41 so this is the power of generator methods and any time you are processing like a pipeline of lots of data you saw that you can chain them together

5:50 to create these pipelines basically effortlessly, we'll see that there is even a simpler way to create

5:56 this type of structure something called a generator expression, right, but we'll save that for the next app.

Python Jumpstart by Building 10 Apps Transcripts Chapter: App 8: File Searcher App Lecture: Generators save the day

Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 8: File Searcher App
Lecture: Generators save the day