Getting started with Dask Transcripts
Chapter: Dask under the hood
Lecture: Take a peek under the hood

Login or purchase this course to watch this video and the rest of the course contents.
0:00 We are almost ready to jump into the notebook and start using Dask. But before we do that, let's look under the hood and find out what are the
0:08 components of Dask and how they work together. At a high level, Dask has collections that
0:13 create task graphs. Then the task graphs are consumed by schedulers, which delegate workers to do the computations.
0:20 Collections are the APIs you use to write Dask code, collections can be high-level like Array, corresponding to Numpy, DataFrame corresponding to
0:29 Pandas and Bag, or they can also be low-level collections, such as Delayed and Futures. These collections create a task graph,
0:37 let's look what a task graph is. For example, these two functions, they do simple mathematical operations and sleep for one second.
0:45 y and y can be executed in parallel. However z, the task z, depends on the results of x and y. Therefore the total time is two seconds.
0:55 Because the individual tasks, each of them takes one second, so if executed in sequence they would take three seconds.
1:02 But because the task graphs understand which part of the work can be done in parallel, actually they are executed in parallel,
1:09 specifically, explicitly saying that. Finally those things combine into a cluster, let's look at what a cluster is comprised of.
1:17 First, it has the scheduler, which is the beating heart. It consumes the task graph and sends tasks to the
1:23 workers, manages the workers, manages the interactions, knows where the workers are, what part of data is on
1:29 what worker and so forth. Then, the workers are the machines that can be added or removed and they perform the actual computation.
1:37 Dask is quite dynamic so new workers can even appear during the workflow being executed, which is known as dynamic scaling.
1:44 Then finally the Client, the Client is the window to the world. It lives where you write your Python code,
1:50 in your JupyterLab session, in your command line interface and so forth. It's the entry point for you to interact with the cluster.
1:58 This is what it looks like in JupyterLab, the Client has a nice output presentation, which tells you where is the Dashboard,
2:06 which you can use to further inspect the inner workings of the cluster. It also has information on the resources allocated to the cluster.


Talk Python's Mastodon Michael Kennedy's Mastodon