Fundamentals of Dask Transcripts
Chapter: Dask Array
Lecture: Introduction to Dask array
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Hi everyone Hugo here. I am very excited to be telling you about Dask.
0:05
Array, which is a wonderful generalization of the 'NumPy Array' that allows you to do a 'Array Computation' at scale with super large dataset,
0:14
among other things. The real purpose of the Dask Array is to have a high
0:18
level user interface to things that are kind of like 'NumPy Arrays' but may not fit
0:24
in memory. So, essentially what we're doing is scaling 'NumPy' code. But one of the really cool parts of the 'Dask Array' is that the code you
0:31
write, the API mimics the NumPy code that you write as well. So on the left here, you see you have 'X =np.array', etc,. And on the right,
0:42
the Dask code is 'X = da.from_array', right. And similarly with mean, it mimics the code. It isn't always exactly the same.
0:50
You'll see we have a '.compute' for Dask which as we'll see, that's because Dask does something called lazy evaluation,
0:56
but that's by the by. The point really is that the code you write is relatively
1:01
similar. Okay. And the other thing to note here is that it's actually doing computation on NumPy Arrays themselves in the back end.
1:08
So, your mental model of what's happening is actually what's happening under the hood, which is pretty cool if you ask me.
1:15
So, I just want to say another few words about NumPy Arrays. You maybe already be aware that Dask is used everywhere.
1:22
It's used in retail at Walmart and Grubhub. It's used in the Life Sciences at Harvard Medical School, among many other places. It's used in finance.
1:30
It's used that geophysical facilities. It's used for a lot of other Softwares such as RAPIDS, Pangeo and PyTorch.
1:37
Now, I just want to make clear that a lot of the time, all of these things actually start with Dask Array.
1:42
So this is a really cool place to get started diving a bit deeper into Dask
1:47
So what we're going to cover essentially we're going to cover demonstrating NumPy,
1:53
so having a look at the basics of the NumPy to familiarize yourself or re familiarize
1:57
yourself with the NumPy. Then we're going to talk about 'Blocked algorithms' and in short
2:02
a blocked algorithm executes on a large dataset by breaking it up into many smaller chunks. Then we're going to introduce the 'Dask Array',
2:10
which you've already had a little hint of in the previous slides. After that, we're gonna just have to have a frank conversation about some of the
2:18
limitations of the 'Dask Array'. There aren't many, but it's worth talking about. And then we're going to provide some references.
2:24
See you in the next video.