Fundamentals of Dask Transcripts
Chapter: Dask-ML
Lecture: Dask-ML for memory bound problems

Login or purchase this course to watch this video and the rest of the course contents.
0:00 All right. Okay. So we have seen how to use and leverage distributed compute in the compute intensive case. In the CPU intensive case,
0:11 for example, we've looked at hyper parameter tuning, but as we have stated, distributed compute can also be leveraged for memory bound problems
0:19 Okay. As we've seen, these types of problems arise when your data set is too large to store in memory So this is where Dask can help.
0:27 In the previous course, you saw how Dask Data Frames can be used to perform pandas like operations on larger than memory data in the same fashion,
0:35 we can use Dask-ML to perform scikit-learn ish operations on our large datasets. So that's what we're going to do now.
0:43 We're not going to import a significantly larger data set, but we're going to show how the API works.
0:48 We're going to show you the code like that for pedagogical purposes. So, first we "import dask_ml.model_selection as dcv".
0:57 Now, Dask-ML model selection has something in it which is a grid search cross validation method but generalizes to out of memory situations.
1:10 So we do that. Then we set up the parameter grid as we've done beforehand And then once again we set up the grid search and use grid search.fit
1:20 supplying it with the arguments X and Y. As before. Also let's have a brief look at another algorithm in the previous checkpoint
1:28 You met logistic regression. And now we're going to show logistic regression using Dask-ML
1:33 . And this really showcases if you know a bit of scikit-learn how Dask-ML mimics it in a very ergonomic and user friendly way as Dask-ML implements.
1:43 The scikit-learn API the code is similar from dask_ml.linear_ model.
1:48 We import logistic regression. Then we take the logistic regression and fit it to X and Y. And then we check out the score on top of that.
1:57 We can also use it to predict on new data, but we're doing it on X here, of course. But we can generalize that to new data as well.
2:05 And we'll check out the first five elements there where we see it predicts false, false, false, false and true.
2:11 All right, that's it. And we'll be back in a minute for a checkpoint


Talk Python's Mastodon Michael Kennedy's Mastodon