Fundamentals of Dask Transcripts
Chapter: Dask Schedulers
Lecture: Types of schedulers
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now with great pleasure, it is time to introduce you to the different types of Dask schedulers. There are two main types of schedulers,
0:10
single machine and distributed as the name suggests, the single machine scheduler. It works only on a single machine and does not scale
0:19
to more than one machine. It is lightweight and simple to use and it's the
0:23
default for many collections. Now there are three types of single machine schedulers is available
0:29
in Dask. First 'threaded' which is backed by a 'thread pool'. All the computations happen in a single process,
0:38
but on multiple threads, what this means is that no data transfer happens between tasks It's lightweight. On top of that,
0:46
it's used mainly when computation isn't Python dominant. What I mean by this is for example, NumPy and Pandas are written in 'Siphon' for efficiency.
0:55
So this is the default operation for Dask array, Dask Data Frame and Dask Delayed,after threaded. We have the 'Multi Processing scheduler',
1:05
which is backed by a 'process pool'. It's still lightweight but here we have multiple processes involved.
1:12
There is some data transfer between different processes so that adds some overhead.
1:16
It will perform best if we can minimize data transfer which is common while reading and
1:21
writing data for example, Dask Bag uses this Scheduler by default.
1:26
Now 3rd we have the 'single threaded/synchronous' scheduler In some cases like debugging,
1:33
for example, certain operations will fail because they don't support parallelism. In such cases we can use this scheduler.
1:41
it computes on a single thread with no parallelism. Next up we have the 'Distributed Scheduler' Now if you want to scale beyond a
1:50
single machine, this is your only choice and in fact we recommend using it even
1:56
if you're working locally, it has more features and better performance optimizations as we'll see in the following videos.