Fundamentals of Dask Transcripts
Chapter: Dask Schedulers
Lecture: Types of schedulers

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now with great pleasure, it is time to introduce you to the different types of Dask schedulers. There are two main types of schedulers,
0:10 single machine and distributed as the name suggests, the single machine scheduler. It works only on a single machine and does not scale
0:19 to more than one machine. It is lightweight and simple to use and it's the
0:23 default for many collections. Now there are three types of single machine schedulers is available
0:29 in Dask. First 'threaded' which is backed by a 'thread pool'. All the computations happen in a single process,
0:38 but on multiple threads, what this means is that no data transfer happens between tasks It's lightweight. On top of that,
0:46 it's used mainly when computation isn't Python dominant. What I mean by this is for example, NumPy and Pandas are written in 'Siphon' for efficiency.
0:55 So this is the default operation for Dask array, Dask Data Frame and Dask Delayed,after threaded. We have the 'Multi Processing scheduler',
1:05 which is backed by a 'process pool'. It's still lightweight but here we have multiple processes involved.
1:12 There is some data transfer between different processes so that adds some overhead.
1:16 It will perform best if we can minimize data transfer which is common while reading and
1:21 writing data for example, Dask Bag uses this Scheduler by default.
1:26 Now 3rd we have the 'single threaded/synchronous' scheduler In some cases like debugging,
1:33 for example, certain operations will fail because they don't support parallelism. In such cases we can use this scheduler.
1:41 it computes on a single thread with no parallelism. Next up we have the 'Distributed Scheduler' Now if you want to scale beyond a
1:50 single machine, this is your only choice and in fact we recommend using it even
1:56 if you're working locally, it has more features and better performance optimizations as we'll see in the following videos.


Talk Python's Mastodon Michael Kennedy's Mastodon