Fundamentals of Dask Transcripts
Chapter: Dask Array
Lecture: Introduction to Dask array

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Hi everyone Hugo here. I am very excited to be telling you about Dask.
0:05 Array, which is a wonderful generalization of the 'NumPy Array' that allows you to do a 'Array Computation' at scale with super large dataset,
0:14 among other things. The real purpose of the Dask Array is to have a high
0:18 level user interface to things that are kind of like 'NumPy Arrays' but may not fit
0:24 in memory. So, essentially what we're doing is scaling 'NumPy' code. But one of the really cool parts of the 'Dask Array' is that the code you
0:31 write, the API mimics the NumPy code that you write as well. So on the left here, you see you have 'X =np.array', etc,. And on the right,
0:42 the Dask code is 'X = da.from_array', right. And similarly with mean, it mimics the code. It isn't always exactly the same.
0:50 You'll see we have a '.compute' for Dask which as we'll see, that's because Dask does something called lazy evaluation,
0:56 but that's by the by. The point really is that the code you write is relatively
1:01 similar. Okay. And the other thing to note here is that it's actually doing computation on NumPy Arrays themselves in the back end.
1:08 So, your mental model of what's happening is actually what's happening under the hood, which is pretty cool if you ask me.
1:15 So, I just want to say another few words about NumPy Arrays. You maybe already be aware that Dask is used everywhere.
1:22 It's used in retail at Walmart and Grubhub. It's used in the Life Sciences at Harvard Medical School, among many other places. It's used in finance.
1:30 It's used that geophysical facilities. It's used for a lot of other Softwares such as RAPIDS, Pangeo and PyTorch.
1:37 Now, I just want to make clear that a lot of the time, all of these things actually start with Dask Array.
1:42 So this is a really cool place to get started diving a bit deeper into Dask
1:47 So what we're going to cover essentially we're going to cover demonstrating NumPy,
1:53 so having a look at the basics of the NumPy to familiarize yourself or re familiarize
1:57 yourself with the NumPy. Then we're going to talk about 'Blocked algorithms' and in short
2:02 a blocked algorithm executes on a large dataset by breaking it up into many smaller chunks. Then we're going to introduce the 'Dask Array',
2:10 which you've already had a little hint of in the previous slides. After that, we're gonna just have to have a frank conversation about some of the
2:18 limitations of the 'Dask Array'. There aren't many, but it's worth talking about. And then we're going to provide some references.
2:24 See you in the next video.


Talk Python's Mastodon Michael Kennedy's Mastodon