Fundamentals of Dask Transcripts
Chapter: Dask Schedulers
Lecture: Comparing different schedulers

Login or purchase this course to watch this video and the rest of the course contents.
0:00 So now it's time to see how 'Schedulers' is in action to do. So we're going to go back to the NYC taxi cab data set And if you
0:09 recall this example for Dask Data Frame, what we're doing is computing the tip amount.
0:14 The first thing we do is import client and instantly call our client with four workers You can see that's what we've done here.
0:22 Uh then we import Dask Data Frame as "dd". We import all our data as a Dask Data Frame. We set up our computation what we want to do,
0:32 which doesn't compute it yet because remember lazy evaluation and then what we do is we
0:38 compute the amount. Okay. And that's exactly what we've done here. We can see it took around two minutes.
0:45 Okay. Now what we're going to do is see this computation using different schedulers and look at the results. Okay.
0:53 So what we're doing here is selecting the scheduler in line while calling,
0:58 compute and we're doing it for the threading processes and synchronous single threaded schedulers in a
1:04 for-loop here and look at what we have so that we can see that the results are the same, but the time to compute varies.
1:12 Now. This is because each scheduler works differently and is best suited for specific purposes. So let's just have a look at the compute time.
1:22 We see that threading, You know, took just under two minutes, processes took several minutes, and and synchronous took 2.5 minutes.
1:34 Now, it looks as though the multi processing scheduler took the longest here


Talk Python's Mastodon Michael Kennedy's Mastodon