Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Reading from JSON
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Welcome back. Now it's time to see how Dask Bags can be used to read from JSON files. Now, if you've been working in data science for a hot minute,
0:09
let's say you've probably had to work with all types of JSON's before. Perhaps JSON's that are pretty large as well.
0:15
So hopefully Dask Bag can help you in this type of work. Also. To begin with, we're simply going to create a Dask Bag from some
0:24
JSON files to do that. Well, first create some random data and store it as JSON files.
0:31
So we perform our imports. Then we use a Utility function to create some data and then we send it to some JSON files.
0:40
As we see here. Now we can see that they've been put in our data directory. What I want to do is just pop over to our data directory as
0:49
as you see, we're in Dask two Dask fundamentals are Here we go into our data directory and we see our JSON's that we've just created
0:57
there. And also we saw some cool activity in the 'Task Stream' and in the 'Cluster Map' as well. So now what we're gonna do is we're going to use
1:07
the read text function to read these. JSON's in as a 'Dask Bag' and assign them to the variable B.
1:14
Now I want to say 'read_text' is mainly used for '.txt' files. The items in the bag will be strings.
1:22
It can also handle compressed files as we see here. It can also be used for '.JSON'. Well, okay, we're going to do this and we see of course we haven't
1:32
computed yet lazy evaluation for the win once again. So we're going to take the first two elements and look at those,
1:40
I would say beautiful, but reading JSON is really beautiful and as we've written here
1:47
the data comes out as lines of text and we can make it more readable using (json.loads) Okay.
1:54
And what we need to do with (json.loads) is we need to map it across the Dask Bag. Okay,
2:01
So what we do here is we ''map(json.loads) across the Dask Bag and then take the first two elements of our new B.
2:11
Fantastic. And now it's more human readable, which is fantastic because we're humans occasionally trying to read.
2:19
I will say that Dask Bag can also read binary files and delayed values. I'd love it if you went and checked it out in the API documentation and we'll
2:27
be back soon to talk about manipulating data with Dask Bag.