Fundamentals of Dask Transcripts
Chapter: Dask Array
Lecture: Demonstrating numpy

Login or purchase this course to watch this video and the rest of the course contents.
0:00 All right, so it's time to jump into the 'NumPy Library' and 'NumPy Arrays'. So, NumPy, as we've mentioned,
0:07 is a Python library that provides multi dimensional arrays, but also provides Routines for fast operations on arrays on top of this,
0:16 it has a collection of high level mathematical functions, among many other things. What we're doing now is an introduction to a very small
0:25 subset of NumPy Arrays, but it will provide a lot of motivation for what we do with Dask in a minute.
0:31 Let's jump into a Jupyter Notebook in Jupyter lab to see Numpy in action. So here we are in Jupyter lab in a Jupyter Notebook about to jump into some
0:40 NumPy stuff. Okay, so in this notebook we're gonna demonstrate NumPy look at some blocked algorithms, then jump into Dask Array,
0:47 which I'm pretty, pretty excited to show you. So the first thing that we want to look at is NumPy,
0:53 we're just gonna show some basic functionality there. So NumPy has a ones( ) function to create unit arrays or these are a raise of
0:58 all ones. So we're gonna use it after doing an import to create a 10 by 10 matrix or array. We use those terms interchangeably of of ones and we'll
1:07 print it. Okay? So yeah, we've got our array of 1's. Now. We can use the 'sum ( )' method on this array that will add up all
1:15 the entries there and we use this 'magic command' (%%time) to time it an array of 100 1's Sums 200. That's good.
1:22 We can see the 'wall time' was 135 microseconds there. And what we're gonna do is we're gonna do kind of similar things with larger arrays
1:29 And see that the time to do these things that gets larger and larger. Right? So we're gonna use the 'Random' module which I love a lot to create
1:38 array an array of random data. So we're going to create a larger one. Is gonna be 1000 by 1000 here.
1:46 So we see that now we're going to perform the sum ( ) and we'll see instead of
1:50 on the order of hundreds of microseconds that took on the order of milliseconds there. Okay, so the time to do it is growing.
1:57 NumPy has a bunch of helpful operations like 'Matrix transpose', 'Matrix Addition' and 'Mean', we're going to use these we're going to create a new
2:05 array. Y By adding X to its transpose And we'll see that took 24 milliseconds. We're also gonna now take the mean of Y
2:14 Which took on the order of milliseconds. Now, what we're gonna do? So I'm going to execute this code because it's going to take a little bit of
2:20 time. So I'm just gonna execute that as well. So it's nice text. So we're creating an even larger matrix. This is going to be 20,000 by 20,000.
2:29 And of course we're using the 'Random Modules' normal function there.
2:33 It's going to give us normally distributed random variables and we're also computing it's mean,
2:38 so this is going to take some time And it should be done give or take in in in in 10 or so seconds.
2:47 So here we have it computed and we see that it took our 40 seconds, which is significantly longer, now if this would take any longer,
2:56 it definitely wouldn't wouldn't make my workflow comfortable. And so that's an example of when we may want to start moving,
3:03 moving to something like Dask. Okay. But before we do that, I'm going to try to do something that people will
3:08 do occasionally, which is import or create an array of a really large size. So this one I'm trying to create one with a billion values along each axis.
3:17 Okay. And look at that. It throughs a 'Memory Error', which means that NumPy isn't even able to handle data
3:23 at this size. Okay. What we're gonna do in the next video is work around this limitation using 'Blocked Algorithms'.
3:30 But after that, we're gonna see how we can achieve success with these types of challenges using Dask also. See you in a minute.


Talk Python's Mastodon Michael Kennedy's Mastodon