Write Pythonic Code Like a Seasoned Developer Transcripts
Chapter: Generators and Collections
Lecture: On-demand computation with yield and generators
0:01 Here is a function called classic_fibonacci
0:03 and what you do is you pass a limit to it
0:05 and it will compute all the Fibonacci numbers up to that limit.
0:09 Notice we have a list called "nums" and it does all the work,
0:12 fills this list up and once it's finally done, it gives you all the numbers.
0:16 Well, what if you want the first million Fibonacci,
0:20 what if you need the first 5 million Fibonacci numbers,
0:23 how long will this method take to run?
0:25 What if you don't know how many you need,
0:27 what if you want to start looking at them and you say
0:29 well, I am looking for the time when I am going to the Fibonacci numbers
0:34 and the second one is a prime number
0:37 the third one is the cube of the first one in the sequence,
0:41 who knows when that is, you are just looking through
0:43 and you are going to decide "oh, now it matches, now I have got enough of these".
0:47 What if you were looking through this
0:49 and you said "I am going to ask for 5 million Fibonacci numbers"
0:52 and it was really the 5 millionth and first, right, maybe you just gave up.
0:57 So we are going to look at a different way to write exactly the same code
1:00 that doesn't have these limitations,
1:02 allows the consumers to process
1:04 as much of these actually infinite series as it needs
1:08 and yet does this in a very much on demand, high performance way.
1:13 This concept is called generators and it has this keyword called yield.
1:17 So let's look at this in code.
1:19 So, here is that same function, we can run it,
1:22 it shows you the first few numbers in the Fibonacci sequence
1:25 we are passing a limit here,
1:28 we are passing a 100 so we want just the first set of Fibonacci numbers
1:32 less than a 100.
1:33 So this is fine, but let's see if we can do better.
1:36 Before we move on, let's actually debug this a little bit.
1:39 So I am going to put a break point here,
1:40 we are going to step into this, all right,
1:43 so here we are and let's step into this method,
1:46 and now we are stepping along, stepping along,
1:48 and notice we are going through the list,
1:50 you can see up here it actually shows you the list being built,
1:53 it shows you the numbers so PyCharm is really cool in that sense,
1:55 you can see the list growing, but notice we are the whole time staying here
1:59 until we get to the limit of 100 which happens pretty soon here,
2:03 right now, and then, we are going through them and processing
2:07 you can "m" is the various values here.
2:11 So that's fine for small numbers, but what like I said,
2:15 in the beginning, what if we don't know what the upper bound is?
2:17 Or what if we have to put a really huge number here,
2:22 what do you think happens to the memory consumption as that number grows,
2:25 obviously we have to gather all the numbers that preceded it
2:29 and hold them in memory all at once and then you get the answer.
2:32 So Python has this really cool keyword called "yield",
2:36 and let's come down here and let's call this a generator_fibonacci,
2:41 so we are going to do a few things, that if you have seen this before,
2:45 you know it's pretty straightforward,
2:47 if you've not seen this, it'll probably blow your mind.
2:49 All right, so what we are going to do is we are going to say
2:51 instead of having this limit, we would like to work on the infinite series,
2:56 now if I just run this code, two things will happen,
2:59 first of all it's going to crash in a hurry, even if for some reason it wouldn't crash,
3:04 if we had like infinite memory, it will still never return, right?
3:07 It's just going to keep adding this infinite series
3:10 but of course it's going to run out of memory.
3:12 So in Python, we can do something both cleaner and better here,
3:15 so what we can do is we can use this yield keyword,
3:18 and yield is like return but instead of returning from the method,
3:21 it just says "hey, I want to create a collection or a sequence
3:25 and here is one of the items,
3:26 and here is one of the items", so we'll yield "current".
3:29 So, that's cool, so that's going to actually generate - continue to yield the items,
3:34 you might wonder well, how we ever get a value out of it? So let's go find out.
3:41 So we are going to do this, now if I run this, it won't crash or anything,
3:44 it will just keep spitting up numbers, scrolling to the right until it kind of goes crazy,
3:49 so this is an infinite sequence but as a consumer of the infinite sequence,
3:56 I can decide "OK, I've had enough".
3:58 So what I will say here is let's say "if m is greater than 100",
4:02 we can use the same test as we have on line 36,
4:05 we can just break out of our loop,
4:07 all right, so let's run this, we should see the same output,
4:11 we do, right,
4:12 classic and generator have the same output but if we go into debugger here,
4:15 it's going to be all sorts of different, all right, so we step in,
4:19 here we are in generator_fibonacci just like we were before
4:22 and here is our "while True",
4:24 now watch what happens as soon as we get the current, which is 1 and we say "yield",
4:27 immediately we are back here,
4:29 we printed it and now look where we return into that loop,
4:33 we just kind of resume the method back here, see there is this back and forth,
4:37 I'll do this a few times, notice now we are going to jump back into this one
4:42 and that current is 3 and next is 5,
4:44 this is like a state machine that remembers where it left off and can be resumed,
4:48 but even though it's an infinite sequence, we don't generate all of them,
4:52 it's more like on demand as you pull items out of it it will compute them,
4:56 so only as much as you pull, you have to pay in terms of computation.
5:00 The other really cool benefit is nowhere are we adding this to a list
5:03 so nowhere are we using, basically nowhere are we storing
5:07 more than one item at memory at a time so memory is not a problem in this situation.
5:11 So these generators are really cool and all you have to do is use the yield keyword.
5:17 If you compared against classic_fibonacci,
5:20 not only is it better performance, more flexible,
5:23 generates all the numbers and so on,
5:25 it's actually shorter and once you get your mind around yield,
5:28 it's actually easier to understand.
5:31 So that's cool, we can also take down here,
5:34 we can create a something like an even_generator()
5:39 and if I were to pass some kind of set here, some kind of number generator like this,
5:45 I could say "for n in numbers, if n % 2 == 0"
5:52 our standard even test, we will say "yield n".
5:56 So given any set of numbers,
5:58 whether this is a list or a generator, it doesn't matter, it doesn't care,
6:02 it's going to pull the even ones out and then down here,
6:04 I can define a method called even_fibonacci and we'll say something like this:
6:11 "for n in even_generator()", and then we can give it generator_fibonacci
6:19 and we can say "yield from this".
6:22 So this will let us compose these things so we can actually create pipelines
6:25 from one to the next.
6:28 So let's run our even Fibonacci through here
6:31 and we should get only the even numbers that are also coming
6:34 from the Fibonacci set and remember,
6:36 this is an infinite sequence because we are starting out
6:40 with the innermost bit,
6:42 an infinite sequence, which itself is a generator
6:45 that will take as many items are there and pass them back.
6:49 But because we don't actually do the work on this part until we pull on it,
6:53 and we don't do the work on this part until we pull on it, it goes something like this,
6:57 pull here, that means go pull this, which pulls on this, which will pull on this piece,
7:02 one item at a time and then when we decide down here we are done, we'll break out.
7:08 So look at this, we have the even Fibonacci numbers,
7:11 and there is not many so 2, 8, and 144. Here they are, brilliant.
7:17 If you want more, we can get more.
7:20 Want up to 10 000, no problem, there they are, 10 000; up to a million,
7:25 there they are up to a million.
7:27 Boom, like that.
7:29 All right, so let's look at this in a graphic,
7:32 remember, we already talked about our algorithm here,
7:35 it's a perfect implementation of Fibonacci but it has the limitations
7:38 where you have to say how many you want
7:40 before you actually get a chance to look at the numbers,
7:43 and you can't look at too many or you'll run out of memory
7:45 if for some reason you had infinite memory, you'd run out of time.
7:48 We can switch to a simpler version using the yield keyword
7:52 create this as a generator and it actually does no work
7:56 until you start pulling on the generator.
7:59 More of what we saw that you can write multiple generators
8:02 and compose them in a pipeline style which is really awesome
8:05 especially in things like data science.