Fundamentals of Dask Transcripts
Chapter: Dask Schedulers
Lecture: Comparing different schedulers
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
So now it's time to see how 'Schedulers' is in action to do. So we're going to go back to the NYC taxi cab data set And if you
0:09
recall this example for Dask Data Frame, what we're doing is computing the tip amount.
0:14
The first thing we do is import client and instantly call our client with four workers You can see that's what we've done here.
0:22
Uh then we import Dask Data Frame as "dd". We import all our data as a Dask Data Frame. We set up our computation what we want to do,
0:32
which doesn't compute it yet because remember lazy evaluation and then what we do is we
0:38
compute the amount. Okay. And that's exactly what we've done here. We can see it took around two minutes.
0:45
Okay. Now what we're going to do is see this computation using different schedulers and look at the results. Okay.
0:53
So what we're doing here is selecting the scheduler in line while calling,
0:58
compute and we're doing it for the threading processes and synchronous single threaded schedulers in a
1:04
for-loop here and look at what we have so that we can see that the results are the same, but the time to compute varies.
1:12
Now. This is because each scheduler works differently and is best suited for specific purposes. So let's just have a look at the compute time.
1:22
We see that threading, You know, took just under two minutes, processes took several minutes, and and synchronous took 2.5 minutes.
1:34
Now, it looks as though the multi processing scheduler took the longest here