Write Pythonic Code Like a Seasoned Developer Transcripts
Chapter: Generators and Collections
Lecture: On-demand computation with yield and generators
0:01 Here is a function called classic_fibonacci and what you do is you pass a limit to it and it will compute all the Fibonacci numbers up to that limit.
0:10 Notice we have a list called "nums" and it does all the work, fills this list up and once it's finally done, it gives you all the numbers.
0:17 Well, what if you want the first million Fibonacci, what if you need the first 5 million Fibonacci numbers, how long will this method take to run?
0:26 What if you don't know how many you need, what if you want to start looking at them and you say
0:30 well, I am looking for the time when I am going to the Fibonacci numbers and the second one is a prime number
0:38 the third one is the cube of the first one in the sequence, who knows when that is, you are just looking through
0:44 and you are going to decide "oh, now it matches, now I have got enough of these". What if you were looking through this
0:50 and you said "I am going to ask for 5 million Fibonacci numbers" and it was really the 5 millionth and first, right, maybe you just gave up.
0:58 So we are going to look at a different way to write exactly the same code that doesn't have these limitations, allows the consumers to process
1:05 as much of these actually infinite series as it needs and yet does this in a very much on demand, high performance way.
1:14 This concept is called generators and it has this keyword called yield. So let's look at this in code. So, here is that same function, we can run it,
1:23 it shows you the first few numbers in the Fibonacci sequence we are passing a limit here,
1:29 we are passing a 100 so we want just the first set of Fibonacci numbers less than a 100. So this is fine, but let's see if we can do better.
1:37 Before we move on, let's actually debug this a little bit. So I am going to put a break point here, we are going to step into this, all right,
1:44 so here we are and let's step into this method, and now we are stepping along, stepping along, and notice we are going through the list,
1:51 you can see up here it actually shows you the list being built, it shows you the numbers so PyCharm is really cool in that sense,
1:56 you can see the list growing, but notice we are the whole time staying here until we get to the limit of 100 which happens pretty soon here,
2:04 right now, and then, we are going through them and processing you can "m" is the various values here.
2:12 So that's fine for small numbers, but what like I said, in the beginning, what if we don't know what the upper bound is?
2:18 Or what if we have to put a really huge number here, what do you think happens to the memory consumption as that number grows,
2:26 obviously we have to gather all the numbers that preceded it and hold them in memory all at once and then you get the answer.
2:33 So Python has this really cool keyword called "yield", and let's come down here and let's call this a generator_fibonacci,
2:42 so we are going to do a few things, that if you have seen this before, you know it's pretty straightforward,
2:48 if you've not seen this, it'll probably blow your mind. All right, so what we are going to do is we are going to say
2:52 instead of having this limit, we would like to work on the infinite series, now if I just run this code, two things will happen,
3:00 first of all it's going to crash in a hurry, even if for some reason it wouldn't crash,
3:05 if we had like infinite memory, it will still never return, right? It's just going to keep adding this infinite series
3:11 but of course it's going to run out of memory. So in Python, we can do something both cleaner and better here,
3:16 so what we can do is we can use this yield keyword, and yield is like return but instead of returning from the method,
3:22 it just says "hey, I want to create a collection or a sequence and here is one of the items, and here is one of the items", so we'll yield "current".
3:30 So, that's cool, so that's going to actually generate - continue to yield the items,
3:35 you might wonder well, how we ever get a value out of it? So let's go find out.
3:42 So we are going to do this, now if I run this, it won't crash or anything,
3:45 it will just keep spitting up numbers, scrolling to the right until it kind of goes crazy,
3:50 so this is an infinite sequence but as a consumer of the infinite sequence, I can decide "OK, I've had enough".
3:59 So what I will say here is let's say "if m is greater than 100", we can use the same test as we have on line 36, we can just break out of our loop,
4:08 all right, so let's run this, we should see the same output, we do, right, classic and generator have the same output but if we go into debugger here,
4:16 it's going to be all sorts of different, all right, so we step in, here we are in generator_fibonacci just like we were before
4:23 and here is our "while True", now watch what happens as soon as we get the current, which is 1 and we say "yield", immediately we are back here,
4:30 we printed it and now look where we return into that loop, we just kind of resume the method back here, see there is this back and forth,
4:38 I'll do this a few times, notice now we are going to jump back into this one and that current is 3 and next is 5,
4:45 this is like a state machine that remembers where it left off and can be resumed,
4:49 but even though it's an infinite sequence, we don't generate all of them, it's more like on demand as you pull items out of it it will compute them,
4:57 so only as much as you pull, you have to pay in terms of computation. The other really cool benefit is nowhere are we adding this to a list
5:04 so nowhere are we using, basically nowhere are we storing more than one item at memory at a time so memory is not a problem in this situation.
5:12 So these generators are really cool and all you have to do is use the yield keyword. If you compared against classic_fibonacci,
5:21 not only is it better performance, more flexible, generates all the numbers and so on, it's actually shorter and once you get your mind around yield,
5:29 it's actually easier to understand. So that's cool, we can also take down here, we can create a something like an even_generator()
5:40 and if I were to pass some kind of set here, some kind of number generator like this, I could say "for n in numbers, if n % 2 == 0"
5:53 our standard even test, we will say "yield n". So given any set of numbers, whether this is a list or a generator, it doesn't matter, it doesn't care,
6:03 it's going to pull the even ones out and then down here, I can define a method called even_fibonacci and we'll say something like this:
6:12 "for n in even_generator()", and then we can give it generator_fibonacci and we can say "yield from this".
6:23 So this will let us compose these things so we can actually create pipelines from one to the next. So let's run our even Fibonacci through here
6:32 and we should get only the even numbers that are also coming from the Fibonacci set and remember,
6:37 this is an infinite sequence because we are starting out with the innermost bit, an infinite sequence, which itself is a generator
6:46 that will take as many items are there and pass them back. But because we don't actually do the work on this part until we pull on it,
6:54 and we don't do the work on this part until we pull on it, it goes something like this,
6:58 pull here, that means go pull this, which pulls on this, which will pull on this piece,
7:03 one item at a time and then when we decide down here we are done, we'll break out. So look at this, we have the even Fibonacci numbers,
7:12 and there is not many so 2, 8, and 144. Here they are, brilliant. If you want more, we can get more.
7:21 Want up to 10 000, no problem, there they are, 10 000; up to a million, there they are up to a million. Boom, like that.
7:30 All right, so let's look at this in a graphic, remember, we already talked about our algorithm here,
7:36 it's a perfect implementation of Fibonacci but it has the limitations where you have to say how many you want
7:41 before you actually get a chance to look at the numbers, and you can't look at too many or you'll run out of memory
7:46 if for some reason you had infinite memory, you'd run out of time. We can switch to a simpler version using the yield keyword
7:53 create this as a generator and it actually does no work until you start pulling on the generator.
8:00 More of what we saw that you can write multiple generators and compose them in a pipeline style which is really awesome
8:06 especially in things like data science.