Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Dask bag to Dask DataFrame
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Congrats on making it through that checkpoint. We've worked a bunch with Dask Bags. Sometimes though,
0:07
We really want to be working with 'Dask Data Frames'. Okay, so the 'Dask Bag' can be used for simple analysis but 'Dask Data Frame'
0:14
and 'Dask Arrays' are more useful sometimes for complex operations. One way to think about it is that they're faster than Dask Bags.
0:23
For the same reason that 'Pandas' and 'NumPy' are faster than Python. They also have more functionalities suited for data analysis.
0:31
How do we do it? Well, we have a wonderful "to_dataframe( )" method. Once again, we recreate our bag from before from our json files.
0:39
Then we apply the 'to_dataframe' method and then we let's check out the head of the data frame. All right, and look at that. So having done all that,
0:49
remember, it's good to be tidy in your workspace. So let's close the cluster with "client.close( )" we'll be back for one last video
0:57
on Dask Bag to talk about Dask Bags limitations and to provide some further references.