Fundamentals of Dask Transcripts
Chapter: Dask-ML
Lecture: Dask in the cloud
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Welcome back Now. It's time to talk about Dask in the cloud. What do I even mean by this and why would we want to do something along
0:09
these lines when we've seen how we can leverage the distributed computation of our local workstations
0:13
The truth is scaling to the cloud can help us a great deal larger workflows will benefit a huge amount from more computational resources,
0:25
such as large clusters, which you will not or may not have locally.
0:29
So there are cloud services that can help you leverage these types of clusters such as Amazon Web Services, Google Cloud Platform,
0:38
Microsoft Azure and so on. How do you get Dask Up and running on these services?
0:44
There are different types of Dask Cloud Deployments such as a 'Kubernetes integration',
0:49
'Yarn integration' among many others. Now there are a significant number to choose from and you will need to know a bunch about containerization,
0:58
Dockerization, maybe kubernetes these types of things in order to get this done
1:03
There are also significant challenges such as environment and data management.
1:09
These involved questions such as all the machines have all the same software installed.
1:15
Can many people share the same hardware and where is the actual data? Another challenge that's involved with cloud deployments,
1:22
security and compliance, which your team leads and IT will be very much interested in These are questions such as authentication,
1:30
do they have access to these machines and security? What stops others from connecting and running arbitrary code as me or you?
1:37
The user. Now, there's another challenge which is cost management and this is this is huge. You want to know what will stop a novice from jumping on an
1:46
idling 100 GPUs. You want to track costs so you want to know how
1:51
much money is everyone spending and you want to optimize costs and optimize workflow for cost So how do we profile and tune for cost?
1:58
So if you're going to get up and running on the cloud, these are the types of questions that you'll need to answer.
2:04
So I gave a talk at the 'Dask distributed summit' in 2021 about getting Dask working on the cloud and hoping to get Dask available to everyone.
2:13
And I encourage you to check that out a "bit.ly/task" for everyone if
2:17
you're interested. But what's happening next is we're going to jump into a notebook and
2:22
check out how to get Dask Up and running on the cloud with a particular service called Coiled and Disclaimer. I work for Coiled and I love it a lot.
2:30
I'll see you in the notebook.