Fundamentals of Dask Transcripts
Chapter: Dask Delayed
Lecture: Parallel for loops
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now it's time to Parallelize. Even more Python code with Delayed. We're going to first 'Parallelized' a 'for-loop',
0:07
which is right for Parallelization. Try to say that fast 10 times because when you do a for-loop, you're doing these results are serially.
0:16
Okay, so that's right to be Parallelized. So that's what we're gonna do here. We're going to create a list which we call 'data',
0:23
a basic list of some dummy data and then we're going to go through the list We're going to '%%time' this in order to see how long it takes.
0:31
And we're going to increment each item in data and 'append' it to the results list
0:37
And then we're going to take the sum and compute the total should be done in a second. All right.
0:44
Took around eight seconds and you can verify that that's the result that you wanted.
0:47
As I said, all of these increments are happening in serial but they could be
0:53
happening in parallel. Okay. So we can wrap certain functions with 'Delayed' to make sure that happens. So we are pretty much the same code as before,
1:01
but we're wrapping the increment functions in Delayed and we're wrapping the sum in Delayed as well. So let's check out how long this takes.
1:08
Okay, Now you may think, wow, that was quick. But remember this hasn't actually performed the computation yet because
1:15
Dask evaluates Lazily. So we need to call compute in order to do this Great. So we see we got out the same result in 1/8 of the
1:26
time here on on my system. Now, what I want to do is visualize the 'task graph' so we can get a sense of what actually happened there.
1:36
So look at that. We had all these increments occurring in Parallel, then feeding into the final sum.
1:43
And that is why our computation took an eighth amount of the time.