Fundamentals of Dask Transcripts
Chapter: Dask-ML
Lecture: Machine learning in the cloud
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
All right, so now it's time to jump in and look at a bit of machine learning in the cloud. Now this section is optional.
0:07
We've given you some resources to think about how to do your data science workflows in
0:12
the cloud. This section is optional because we'll be using the product that we work
0:17
on coiled. You can also get set up on AWS yourself with Dask kubernetes or anything along those lines out of the technologies we just introduced,
0:26
but we'll be doing it on on coiled and feel free to get started. Sign up and code along as well.
0:32
So what I've already done is I've signed up to Coiled cloud, got my login token there and I'm going to use some of my,
0:40
my, my free credits now for the purposes of time, I've already imported coiled and created a cluster there.
0:46
What I'm gonna do now is instantiate a client and then look at the dashboard and then we're going to do some machine learning.
0:54
So what this is actually done is it's already created a cluster on AWS for me
1:00
using all the coiled coil technology according to a predefined environment. It throws a little warning here,
1:07
which is all that signaling. It's not an error. Mind you, it's a warning that there are version mismatches between what's on my cloud
1:15
environment and what's happening locally. So that's generally cool for the time being.
1:19
Remember, you can click on this being here to get our desk dashboards.
1:25
We can also look at our opinionated coil dashboard here which shows you what we consider to be the most important diagnostic tools for task.
1:34
We've got, you know, the task stream processes, progress along these lines and we'll see a bit of action there in a minute.
1:42
So the dashboard link of course points to AWS So what we're gonna do is we're gonna fit a "KMeans" model to
1:51
some data that we're going to generate using using scikit-learn and we're going to use
1:55
the "dask_mlKMeans". So this is actually we're not trying to predict a label
2:01
but we've got all these these data points and we're trying to find clusters of them Okay, so for those of you have done a bit of machine learning.
2:09
This is something called unsupervised learning. That doesn't really matter. We're just gonna show you how the API works and
2:15
how easy it is to scale your work close to the cloud here. So we're going to generate some fake data or synthetic data once again.
2:22
And it's a small data set. And we're doing this for pedagogical purposes. But we'll see how quickly it's processed on the cloud.
2:29
We're going to import KMeans and now we're going to fit K. Means to the data that we've generated.
2:38
What we're going to see on the dashboard is a bunch of work, starting starting to be done. So we can see a bunch of a raise, get items, data, transfer,
2:48
these, these types of things. That's good. We can see all that work happening across all our workers there and it looks like it may have stopped.
2:58
Let's go and see in Jupyter lab, yep, That took 20 seconds and that all happened on aws through coiled cloud.
3:06
What we want to do is see what labels it actually predicted. So I I think would be finding five or so clusters. That's the data we generated anyway.
3:15
So we'll see whether that's that would be the case. This will be 100 by one array of the predicted labels or clusters for each of
3:24
our data points. And let's just compute the 1st 10 and see what it looks like. Perfect. Looks like it found more than more than five clusters.
3:32
That's fine. Maybe it's an algorithm we need to tweak in future. That wasn't the point of this video.
3:36
Rather, it was to show you with products such as coiled cloud, how easy it can be to scale to the cloud, but I definitely encourage you all,
3:43
not only to check out coiled cloud, but to check out whether you wanna figure out how to provision your own AWS clusters
3:49
and that type of stuff using dust kubernetes or whatever it may be. And if that works for you,
3:54
as always, we practice healthy distributed data flow hygiene and close our client and we'll
3:59
be back soon to tell you about a few references for further work.