Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 9: Real Estate Analysis App
Lecture: Data mining with generator expressions
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
So let's look at the code again that we are using our list comprehension, and you can see I've changed it a little bit.
0:08
Down here on line 107 I've written a method called announce, and now what it does is if you pass an item to announce,
0:17
it simply returns that item back but along the way it does a print statement to let you know hey I am processing this particular item,
0:25
and you can give it like kind of a little descriptor, so over here when we are going to the two bedroom homes,
0:32
each time to the test we are going to go through all the homes that's going to say hey,
0:36
tell me what home you are processing, and continue with the regular test, yeah? And then later on, I decide hey,
0:43
I only want to process the first five two bedroom homes, now the way this processing works is almost identical
0:51
to the challenges that we ran into in application number 8, when we were doing text searches across gigabytes of text files.
0:58
Remember, for small quantities of data this mechanism where we had functions that recursively searched directories
1:07
and files and each time it hit a file it would create a list put all the matches in there
1:10
and return it up and build that up into an entire set of search results for the whole directory structure, and then at the end,
1:18
pass that list back with all the matches. They work fine for megabytes of text files, for gigabytes of text files, the memory went crazy,
1:27
it took forever, there were all kinds of problems and you'll see that this is identical in it sort of processing characteristics,
1:34
and that should be no surprise, because this list comprehension is very much like create a list,
1:40
loop over it, fill it all the way up and then here is the answer, remember, our solution was to use generator methods,
1:46
the yield keyword and it would sort of on demand as the client would pull, doing loop, a forin loop or some kind of processing of the end result,
1:54
it would one by one pull those results back. So we can use this thing called a generator expression to move from the list colon wait style
2:03
to this yield return co-routine style but in the list comprehension and what we call a generator expression style processing
2:11
where it's just not a method with the yield keyword but it's just an expression on a line. Now the way we do this is remarkably easy but before I do it
2:19
let me show you how the processing works here, so if I run this, you'll see even though we are only processing 5 homes,
2:25
we are going through look at this, every single home we basically process the entire data stream and then,
2:32
we go back and oh, actually only of the first five two bedroom homes, what would be much better is to just stop
2:39
after we get to the first five two bedroom homes, so what do we have to do to change this? Well, for list you use square brackets,
2:49
for what we call generator expressions that have this yield behavior, this co-routine behavior, we just use parenthesis, that's it, we do nothing else.
2:58
Now, this you cannot index into those things any more, or slice them which basically uses indexing so I am going to need to create a little list
3:07
so it's not a perfect analogy how we can flow these strings through, but we are going to use this as sort of the step to say
3:12
hey we are going to stop after we get just five of these, and we don't really need this anymore, do we, because that's actually doing the same thing,
3:20
so remember, previously, this part right here went through the entire data structure, let's try again.
3:27
Ok, so we are still pointing in the five items down here, but as we go through, we can see there is a lot fewer,
3:33
we just go until we have enough two bedroom homes to run through here, and then we stop, we don't have to process the whole data set,
3:41
and down here we can get the average and stuff, the averages of course are different because we are not averaging
3:47
across all two bedroom homes just the first five, really the way we have things structured this is the cheapest five two bedroom homes.
3:54
In fact, there is a few more places where we can use generator expressions in our app. So we were using the generator expression up here,
4:04
but now we are creating a list, filling it all the way up and then passing that to the mean.
4:08
We have no reason to do that, we can use parenthesis instead of square brackets,
4:12
and we'll only have one of these little projection pieces in memory at a time for each step, make our mean calculation a little more efficient.
4:21
Try again, you see we get exactly the same answer, different performance characteristics. So you see how we can use list comprehensions
4:30
and later generator expressions to replace procedural operations with more declarative ones, and get the performance benefits of generator methods
4:40
without actually writing methods at all.