Getting started with Dask Transcripts
Chapter: The Dask API
Lecture: Creating our first Dask cluster with the introduction notebook

Login or purchase this course to watch this video and the rest of the course contents.
0:00 We are in notebook number one. All we really need to start a Dask cluster is in this first block of code.
0:06 We import the Client class from the dask.distributed module and then create a new client object while specifying the number of workers that we want.
0:15 That's it, that's really all it takes. Looking at our client, we can see the locations of the Scheduler,
0:21 and the Dashboard as well as available cluster resources, the workers, the cores and the memory. The dashboard URL will take you to Dask's
0:28 diagnostic dashboards that display real time information about the state of your cluster. Go ahead, click the link,
0:36 explore the dashboard. When you're done with your computation, always remember to close the cluster.
0:42 If you have multiple clients running it may cause a lot of confusion and if you're
0:46 connected to a remote cluster, you might be accumulating idle charges. We don't want that now, do we?
0:51 Dask Delayed is one of the low-level APIs in Dask. Let's look at how we can parallelize and distribute any Python code with Dask.
0:59 Consider these two functions that do basic arithmetic operations and sleep for one second each. In
1:05 regular Python, incrementing two numbers and adding them together happens sequentially and takes three seconds in
1:12 total. We can parallelize this regular Python code using Dask Delayed. All you need to do is use the delayed decorator for the appropriate operations.
1:23 Well, let's look at that. It took 600 microseconds. Fantastic, right? but wait a second,
1:30 lazy evaluation is what happened. I don't want to bust your bubble, but Dask has actually not computed your result yet.
1:38 If you recall, from "Dask under the hood", here Dask has created a task graph
1:43 and it's ready to compute whenever you ask it to do so. This is called lazy
1:49 evaluation. It's evaluating only when you need to. Dask computes your result
1:55 only when you call compute. You can also visualize the task graph by calling visualize()


Talk Python's Mastodon Michael Kennedy's Mastodon