Getting started with Dask Transcripts
Chapter: Scaling Dask in the cloud
Lecture: Running on 40 cores in the cloud with a few lines of code
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
The cluster is ready. We can open the dashboards, note that the AWS link here, or we can connect our JupyterLab plugin to this remote cluster.
0:10
I'll do so by copying the link to the scheduler. Opening the Dask tab and pasting it over here. After a second, I'm connected to my new cluster,
0:21
my distributed cluster. Now I'll run the same computation as before. This time we read data directed from Amazon S3.
0:31
Do you remember at the outset of the course, how long it took to download the entire dataset to local disk?
0:37
Now the entire network exchange is happening in the cloud. We can see the different parts of the dataset being read by the different workers.
0:43
The Dask cluster dashboard gives us live updates to what's going on on the remote setup. In fact the entire computation happened in just nine seconds.
0:55
We just saw how the same notebook we started, with some Pandas computation on our local machine,
1:00
moved to using Dask DataFrame locally to parallelize and work on larger data, then also connected to AWS to work on even larger data.
1:09
Everything from your laptop and the same notebook, from anywhere in the world. This is the real power of Dask.