Getting started with Dask Transcripts
Chapter: Scaling Dask in the cloud
Lecture: Running on 40 cores in the cloud with a few lines of code

Login or purchase this course to watch this video and the rest of the course contents.
0:00 The cluster is ready. We can open the dashboards, note that the AWS link here, or we can connect our JupyterLab plugin to this remote cluster.
0:10 I'll do so by copying the link to the scheduler. Opening the Dask tab and pasting it over here. After a second, I'm connected to my new cluster,
0:21 my distributed cluster. Now I'll run the same computation as before. This time we read data directed from Amazon S3.
0:31 Do you remember at the outset of the course, how long it took to download the entire dataset to local disk?
0:37 Now the entire network exchange is happening in the cloud. We can see the different parts of the dataset being read by the different workers.
0:43 The Dask cluster dashboard gives us live updates to what's going on on the remote setup. In fact the entire computation happened in just nine seconds.
0:55 We just saw how the same notebook we started, with some Pandas computation on our local machine,
1:00 moved to using Dask DataFrame locally to parallelize and work on larger data, then also connected to AWS to work on even larger data.
1:09 Everything from your laptop and the same notebook, from anywhere in the world. This is the real power of Dask.


Talk Python's Mastodon Michael Kennedy's Mastodon