Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Reading from JSON

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Welcome back. Now it's time to see how Dask Bags can be used to read from JSON files. Now, if you've been working in data science for a hot minute,
0:09 let's say you've probably had to work with all types of JSON's before. Perhaps JSON's that are pretty large as well.
0:15 So hopefully Dask Bag can help you in this type of work. Also. To begin with, we're simply going to create a Dask Bag from some
0:24 JSON files to do that. Well, first create some random data and store it as JSON files.
0:31 So we perform our imports. Then we use a Utility function to create some data and then we send it to some JSON files.
0:40 As we see here. Now we can see that they've been put in our data directory. What I want to do is just pop over to our data directory as
0:49 as you see, we're in Dask two Dask fundamentals are Here we go into our data directory and we see our JSON's that we've just created
0:57 there. And also we saw some cool activity in the 'Task Stream' and in the 'Cluster Map' as well. So now what we're gonna do is we're going to use
1:07 the read text function to read these. JSON's in as a 'Dask Bag' and assign them to the variable B.
1:14 Now I want to say 'read_text' is mainly used for '.txt' files. The items in the bag will be strings.
1:22 It can also handle compressed files as we see here. It can also be used for '.JSON'. Well, okay, we're going to do this and we see of course we haven't
1:32 computed yet lazy evaluation for the win once again. So we're going to take the first two elements and look at those,
1:40 I would say beautiful, but reading JSON is really beautiful and as we've written here
1:47 the data comes out as lines of text and we can make it more readable using (json.loads) Okay.
1:54 And what we need to do with (json.loads) is we need to map it across the Dask Bag. Okay,
2:01 So what we do here is we ''map(json.loads) across the Dask Bag and then take the first two elements of our new B.
2:11 Fantastic. And now it's more human readable, which is fantastic because we're humans occasionally trying to read.
2:19 I will say that Dask Bag can also read binary files and delayed values. I'd love it if you went and checked it out in the API documentation and we'll
2:27 be back soon to talk about manipulating data with Dask Bag.


Talk Python's Mastodon Michael Kennedy's Mastodon