Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Reading from Python collections

Login or purchase this course to watch this video and the rest of the course contents.
0:00 It's time to jump into Dask Bags and we're going to first learn about how to read from a Python list,
0:07 other types of collections and sequences into Dask Bag's. Okay, before reading data into into a Dask Bag. What we want to do of course as always,
0:16 is start by creating a Cluster. So I'm going to do that here and you can cut along in your Jupyter Notebook
0:21 spreading a Cluster with 4 workers and then we're going to open a couple of dashboards once this cluster is created. Cool, look at that 4 workers.
0:34 8 cores 8 gigs of memory. I am going to open my 'Cluster Map' now I'll drag that over to the side and I also want my 'Task Stream'.
0:48 Beautiful. I'm gonna close this for our viewing pleasure, drag this down here and we are ready to go,
1:00 What we're gonna do is we're gonna create a Dask Bag from a Python List But just to be clear,
1:04 you can create bags similarly from sets and dictionaries and other general Python Objects such
1:10 as 'Collections' and 'Sequences'. The data we want to partition into blocks.
1:15 In the following example, it's a small example for learning purposes and we in this example there are two partitions with five elements each.
1:23 Now you may say, well I could do that in one partition. Of course you could, but we're doing this for for teaching purposes as we've said
1:29 So first we import Dask Bag as 'db', which is a convention just as numpy as 'np' and pandas as 'pd' is never import
1:39 pandas as 'mp'. So now we're going to execute this code using 'db' from sequence
1:44 and passing it the list and passing it the keyword argument and partitions sending that equal
1:49 to two. And look, it's returned a Dask Bag object as you may have
1:55 appreciated would happen. No computation has occurred because Dask evaluate things lazily. So we need to call 'compute( )' to get the result.
2:04 So let's do that. And let's be prepared to see some things happen in our 'task stream' and some stuff happen in our close them up.
2:12 Fantastic. So we've seen a couple of tasks occur. We saw a list light up a bit. Okay. And we can see that it's returned the bag as as expected.
2:22 The other thing we can do is use the 'take method' to display the first few values directly. So I'm going to apply the 'take method' to the Dask Bag B
2:32 And give it the argument three in order to give me. There we go. The first three elements of the Dask Bag, Alaska, Minnesota and Georgia. All right.
2:43 We'll be back in a minute to start reading some more messy unstructured data into Dask Bags.


Talk Python's Mastodon Michael Kennedy's Mastodon