Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 9: Real Estate Analysis App
Lecture: Data mining with generator expressions
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
So let's look at the code again that we are using our list comprehension,
0:05
and you can see I've changed it a little bit.
0:07
Down here on line 107 I've written a method called announce,
0:12
and now what it does is if you pass an item to announce,
0:16
it simply returns that item back but along the way
0:20
it does a print statement to let you know
0:22
hey I am processing this particular item,
0:24
and you can give it like kind of a little descriptor,
0:28
so over here when we are going to the two bedroom homes,
0:31
each time to the test we are going to go through
0:33
all the homes that's going to say hey,
0:35
tell me what home you are processing, and continue with the regular test, yeah?
0:39
And then later on, I decide hey,
0:42
I only want to process the first five two bedroom homes,
0:46
now the way this processing works is almost identical
0:50
to the challenges that we ran into in application number 8,
0:53
when we were doing text searches across gigabytes of text files.
0:57
Remember, for small quantities of data this mechanism
1:02
where we had functions that recursively searched directories
1:06
and files and each time it hit a file it would create a list put all the matches in there
1:09
and return it up and build that up into an entire set of search results
1:14
for the whole directory structure, and then at the end,
1:17
pass that list back with all the matches.
1:20
They work fine for megabytes of text files,
1:23
for gigabytes of text files, the memory went crazy,
1:26
it took forever, there were all kinds of problems
1:29
and you'll see that this is identical in it sort of processing characteristics,
1:33
and that should be no surprise,
1:35
because this list comprehension is very much like create a list,
1:39
loop over it, fill it all the way up and then here is the answer,
1:42
remember, our solution was to use generator methods,
1:45
the yield keyword and it would sort of on demand as the client would pull,
1:48
doing loop, a forin loop or some kind of processing of the end result,
1:53
it would one by one pull those results back.
1:56
So we can use this thing called a generator expression
1:59
to move from the list colon wait style
2:02
to this yield return co-routine style
2:05
but in the list comprehension
2:07
and what we call a generator expression style processing
2:10
where it's just not a method with the yield keyword
2:13
but it's just an expression on a line.
2:15
Now the way we do this is remarkably easy but before I do it
2:18
let me show you how the processing works here, so if I run this,
2:21
you'll see even though we are only processing 5 homes,
2:24
we are going through look at this,
2:27
every single home we basically process the entire data stream and then,
2:31
we go back and oh, actually only of the first five two bedroom homes,
2:36
what would be much better is to just stop
2:38
after we get to the first five two bedroom homes,
2:41
so what do we have to do to change this?
2:44
Well, for list you use square brackets,
2:48
for what we call generator expressions
2:50
that have this yield behavior, this co-routine behavior,
2:53
we just use parenthesis, that's it, we do nothing else.
2:57
Now, this you cannot index into those things any more,
3:01
or slice them which basically uses indexing
3:04
so I am going to need to create a little list
3:06
so it's not a perfect analogy how we can flow these strings through,
3:09
but we are going to use this as sort of the step to say
3:11
hey we are going to stop after we get just five of these,
3:14
and we don't really need this anymore, do we,
3:16
because that's actually doing the same thing,
3:19
so remember, previously, this part right here went through the entire data structure,
3:24
let's try again.
3:26
Ok, so we are still pointing in the five items down here,
3:29
but as we go through, we can see there is a lot fewer,
3:32
we just go until we have enough two bedroom homes to run through here,
3:37
and then we stop, we don't have to process the whole data set,
3:40
and down here we can get the average and stuff,
3:43
the averages of course are different because we are not averaging
3:46
across all two bedroom homes just the first five,
3:49
really the way we have things structured
3:51
this is the cheapest five two bedroom homes.
3:53
In fact, there is a few more places
3:56
where we can use generator expressions in our app.
4:00
So we were using the generator expression up here,
4:03
but now we are creating a list, filling it all the way up
4:05
and then passing that to the mean.
4:07
We have no reason to do that, we can use parenthesis instead of square brackets,
4:11
and we'll only have one of these little projection pieces in memory at a time for each step,
4:16
make our mean calculation a little more efficient.
4:20
Try again, you see we get exactly the same answer,
4:24
different performance characteristics.
4:27
So you see how we can use list comprehensions
4:29
and later generator expressions to replace procedural operations
4:33
with more declarative ones,
4:35
and get the performance benefits of generator methods
4:39
without actually writing methods at all.