Getting started with Dask Transcripts
Chapter: Using the Dask DataFrame
Lecture: What will we cover?

Login or purchase this course to watch this video and the rest of the course contents.
0:01 Dask DataFrame, Dask DataFrame is the high level API that we use to scale Pandas code. Take a look at these two code snippets.
0:11 Do you see how Dask code on the right is almost identical to Pandas code on the left? That is not a coincidence,
0:17 it's an intentional deliberate choice of design. Dask creators wanted to invent nothing, they wanted Dask to be as familiar as possible
0:25 to users of the PyData stack. In this chapter we'll start by downloading the New
0:30 York City Yellow Taxicab Dataset. We'll perform some Pandas operations on them.
0:36 Well then scale this same Pandas code using Dask DataFrame and use the Dask Dashboards to understand parallel computations happening live.
0:45 And finally, we'll discuss some limitations of Dask DataFrame and share some resources where you
0:50 can learn more. Pandas is an incredibly popular library for analyzing tabular data. Data practitioners
0:58 use Pandas for pre-processing tasks and for exploratory analysis. Pandas is a very powerful library but it has a limitation that is hard to overcome
1:08 when working with Big Data. It can only work on data in your RAM. When your dataset exceeds that, Pandas throws a memory error like this one here.


Talk Python's Mastodon Michael Kennedy's Mastodon