Fundamentals of Dask Transcripts
Chapter: Dask-ML
Lecture: Machine learning in the cloud

Login or purchase this course to watch this video and the rest of the course contents.
0:00 All right, so now it's time to jump in and look at a bit of machine learning in the cloud. Now this section is optional.
0:07 We've given you some resources to think about how to do your data science workflows in
0:12 the cloud. This section is optional because we'll be using the product that we work
0:17 on coiled. You can also get set up on AWS yourself with Dask kubernetes or anything along those lines out of the technologies we just introduced,
0:26 but we'll be doing it on on coiled and feel free to get started. Sign up and code along as well.
0:32 So what I've already done is I've signed up to Coiled cloud, got my login token there and I'm going to use some of my,
0:40 my, my free credits now for the purposes of time, I've already imported coiled and created a cluster there.
0:46 What I'm gonna do now is instantiate a client and then look at the dashboard and then we're going to do some machine learning.
0:54 So what this is actually done is it's already created a cluster on AWS for me
1:00 using all the coiled coil technology according to a predefined environment. It throws a little warning here,
1:07 which is all that signaling. It's not an error. Mind you, it's a warning that there are version mismatches between what's on my cloud
1:15 environment and what's happening locally. So that's generally cool for the time being.
1:19 Remember, you can click on this being here to get our desk dashboards.
1:25 We can also look at our opinionated coil dashboard here which shows you what we consider to be the most important diagnostic tools for task.
1:34 We've got, you know, the task stream processes, progress along these lines and we'll see a bit of action there in a minute.
1:42 So the dashboard link of course points to AWS So what we're gonna do is we're gonna fit a "KMeans" model to
1:51 some data that we're going to generate using using scikit-learn and we're going to use
1:55 the "dask_mlKMeans". So this is actually we're not trying to predict a label
2:01 but we've got all these these data points and we're trying to find clusters of them Okay, so for those of you have done a bit of machine learning.
2:09 This is something called unsupervised learning. That doesn't really matter. We're just gonna show you how the API works uh and
2:15 how easy it is to scale your work close to the cloud here. So we're going to generate some fake data or synthetic data once again.
2:22 And it's a small data set. And we're doing this for pedagogical purposes. But we'll see how quickly it's processed on the cloud.
2:29 We're going to import KMeans and now we're going to fit K. Means to the data that we've generated.
2:38 What we're going to see on the dashboard is a bunch of work, starting starting to be done. So we can see a bunch of a raise, get items, data, transfer,
2:48 these, these types of things. That's good. We can see all that work happening across all our workers there and it looks like it may have stopped.
2:58 Let's go and see in Jupyter lab, yep, That took 20 seconds and that all happened on aws through coiled cloud.
3:06 What we want to do is see what labels it actually predicted. So I I think would be finding five or so clusters. That's the data we generated anyway.
3:15 So we'll see whether that's that would be the case. This will be 100 by one array of the predicted labels or clusters for each of
3:24 our data points. And let's just compute the 1st 10 and see what it looks like. Perfect. Looks like it found more than more than five clusters.
3:32 That's fine. Maybe it's an algorithm we need to tweak in future. That wasn't the point of this video.
3:36 Rather, it was to show you with products such as coiled cloud, how easy it can be to scale to the cloud, but I definitely encourage you all,
3:43 not only to check out coiled cloud, but to check out whether you wanna figure out how to provision your own AWS clusters
3:49 and that type of stuff using dust kubernetes or whatever it may be. And if that works for you,
3:54 as always, we practice healthy distributed data flow hygiene and close our client and we'll
3:59 be back soon to tell you about a few references for further work.


Talk Python's Mastodon Michael Kennedy's Mastodon