Getting started with Dask Transcripts
Chapter: Using the Dask DataFrame
Lecture: What will we cover?
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
Dask DataFrame, Dask DataFrame is the high level API that we use to scale Pandas code. Take a look at these two code snippets.
0:11
Do you see how Dask code on the right is almost identical to Pandas code on the left? That is not a coincidence,
0:17
it's an intentional deliberate choice of design. Dask creators wanted to invent nothing, they wanted Dask to be as familiar as possible
0:25
to users of the PyData stack. In this chapter we'll start by downloading the New
0:30
York City Yellow Taxicab Dataset. We'll perform some Pandas operations on them.
0:36
Well then scale this same Pandas code using Dask DataFrame and use the Dask Dashboards to understand parallel computations happening live.
0:45
And finally, we'll discuss some limitations of Dask DataFrame and share some resources where you
0:50
can learn more. Pandas is an incredibly popular library for analyzing tabular data. Data practitioners
0:58
use Pandas for pre-processing tasks and for exploratory analysis. Pandas is a very powerful library but it has a limitation that is hard to overcome
1:08
when working with Big Data. It can only work on data in your RAM. When your dataset exceeds that, Pandas throws a memory error like this one here.