Getting started with Dask Transcripts
Chapter: Scaling Dask in the cloud
Lecture: Running on 40 cores in the cloud with a few lines of code
Login or
purchase this course
to watch this video and the rest of the course contents.
The cluster is ready. We can open the dashboards, note that the AWS link here, or we can connect our JupyterLab plugin to this remote cluster.
I'll do so by copying the link to the scheduler. Opening the Dask tab and pasting it over here. After a second, I'm connected to my new cluster,
my distributed cluster. Now I'll run the same computation as before. This time we read data directed from Amazon S3.
Do you remember at the outset of the course, how long it took to download the entire dataset to local disk?
Now the entire network exchange is happening in the cloud. We can see the different parts of the dataset being read by the different workers.
The Dask cluster dashboard gives us live updates to what's going on on the remote setup. In fact the entire computation happened in just nine seconds.
We just saw how the same notebook we started, with some Pandas computation on our local machine,
moved to using Dask DataFrame locally to parallelize and work on larger data, then also connected to AWS to work on even larger data.
Everything from your laptop and the same notebook, from anywhere in the world. This is the real power of Dask.