Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Introducing Dask bag

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Let's talk about another high level DASK collection. Dask Bag. We've already seen 'Dask Array' and 'Dask Data Frame'.
0:09 These collections are great for structured data but we don't always have organized data like that
0:14 now do we? Sometimes we have huge 'XML' or 'JSON' files that come with inconsistencies and annoys. In other words, the data is messy. Dask Bag,
0:26 helps us work with this type of data. Typically, users start with Dask Bag for pre processing data,
0:33 which is making the data suitable for further analyses. Then they moved to other Dask collections to work with the data.
0:41 Most often 'Dask Data Frames'. Dask bag is powerful because we can use it to work with general Python Data Structures as well,
0:49 like lists, dictionaries and sets. Dask Bag implements operations like map, filter fold and more on these data structures by leveraging,
1:00 Parallel compute if you've worked with 'iter' tools or 'Py tools' before you can think of Dask Bag as a Parallel version of these.
1:08 Now we can condense Dask Bag's benefits to two key areas. Computing in Parallel, which means we can use all the compute power your machine has
1:19 and 'Iterating' 'Dask Bag' computes lazily, which allows us to work with large datasets comfortably, even on a single machine with a single core.
1:29 In this chapter, we will learn to read and manipulate different types of data using Dask bag, we will also see how you can convert a 'Desk Bag to a
1:39 Dask Data Frame, a common workflow among data professionals. Again, we'll share the limitations you need to be aware of and leave you with
1:47 some references to explore further. Now, lets jump in to the Notebook.


Talk Python's Mastodon Michael Kennedy's Mastodon