Getting started with Dask: High Performance Data Science Course

0.6 hours, 100% free
Take this course for FREE

Course Summary

While pandas is one of the true pillars of Python's data science tech stack, there will be times when you outgrow it. For example, if you need to leverage all 16 cores on your system rather than just 1 to get an answer much faster or if you have more data than will fit into your RAM, or even onto your disk. That's where Dask comes to the rescue. It has an API that is broadly compatible with pandas' but scales Python computation across cores, and across computers to bring you blazing fast analysis of data that exceeds what any single computer can handle. That’s just the beginning! Dask is a free and open source library that helps scale your data science workflows and provides a complete framework for distributed computing in Python.
This course will get you up to speed with Dask and show you how to easily convert pandas workloads to blazing Dask clusters (locally across cores or scaled-out across cloud servers). Future courses will get you up to speed with Dask and Pythonic distributed computation in other settings, such as machine learning.

What students are saying

A big THANK YOU for your podcast and training content. Your [course] feels so relevant and covers topics that are so applicable to everyday life that it has been easy to stay focused and enjoy the training. It feels like the learning/retention is just a by-product that comes naturally when you designed it so well.
-- Marc

Source code and course GitHub repository

github.com/coiled/talkpython-getting-started-with-dask

What's this course about and how is it different?

This free course is a quick and no-fluff introduction to Dask. It's authored by the folks over at Coiled who offer Dask as a Service, including Matthew Rocklin, one of the co-creators of Dask. So you know you're getting definitive information from people who use Dask in practice.

What topics are covered

In this course, you will:

  • Explore the problem solved by Dask: What is big data and how can you work with it?
  • Setup your computer to run Dask locally in a Jupyter notebook
  • Learn the Dask API and how to use it
  • Convert pandas code to Dask code
  • Analyze the NYC taxicab data set with Dask on a local cluster
  • Scale that same computation to the cloud at coiled.io
  • Connect to local and remote Dask cluster visualization and reporting dashboards
  • And lots more

View the full course outline.

Who is this course for?

This course is for anyone with basic Python language experience who would like to use Dask to process more data faster than pandas easily handles. You'll need to know things like variables, modules, import statements, and things like this. Be the Python code used is not deep or advanced so it should be broadly available to most.

Note: All software used during this course, including editors, Python language, etc., are 100% free and open source. You won't have to buy anything to take the course.

Get hands-on for almost every chapter

While watching videos is great to give you that high-level overview of what you need to know about a technology, nothing makes that skill your own like writing actual code and scaling data science computations in your notebooks.

In this course, you'll have access to all the source code at github.com/coiled/talkpython-dask-course. You're encouraged to follow along and play with the notebook throughout this course.

This course is delivered in very high resolution

Example of 1440p high res video

This course is delivered in 1440p (4x the pixels as 720p). When you're watching the videos for this course, it will feel like you're sitting next to the instructor looking at their screen.

Every little detail, menu item, and icon is clear and crisp. Watch the introductory video at the top of this page to see an example.

Follow along with subtitles and transcripts

Each course comes with subtitles and full transcripts. The transcripts are available as a separate searchable page for each lecture. They also are available in course-wide search results to help you find just the right lecture.

Each course has subtitles available in the video player.

Free office hours keep you from getting stuck

One of the challenges of self-paced online learning is getting stuck. It can be hard to get the help you need to get unstuck.

That's why at Talk Python Training, we offer live, online office hours. You drop in and join a group of fellow students to chat about your course progress and see solutions via screen sharing.

Just visit your account page to see the upcoming office hour schedule.

The time to act is now

If you are working with data using pandas or other data science libraries, you owe it to yourself to see how to process significantly larger datasets and how to run Python computation outside the grips of the GIL and across cores all the way out to across an entire cluster. This free, short course will get you up to speed in less than one hour!

Course Outline: Chapters and Lectures

Welcome to the course
3:39
Intro to the course and to Matthew Rocklin
3:18
Instructor - Michal Mucha
0:21
How do I get help?
1:36
Getting help
1:36
What is big data?
2:32
Big data?
2:05
Big data: Check point
0:10
Big data: Check point answer
0:17
Setup to follow along
6:15
Setting up your system
1:03
Clone the notebook repository
0:51
Running JupyterLab
4:21
Dask under the hood
2:54
Take a peek under the hood
2:13
Under the hood checkpoint
0:41
The Dask API
2:16
Parts of Dask that we will cover
0:15
Creating our first Dask cluster with the introduction notebook
2:01
Using the Dask DataFrame
11:38
What will we cover?
1:19
pandas
2:42
Download the dataset
0:32
Reading and working with data in pandas
4:51
Sharing intermediate results
1:05
DataFrame checkpoint
0:19
Limitations of Dask DataFrame
0:50
Scaling Dask in the cloud
3:09
Introducing Coiled cloud
0:48
Create your Coiled account and sign in
0:19
Find your Coiled token to use locally
0:27
Connecting your Dask dashboard to the cloud cluster
1:15
Running on 40 cores in the cloud with a few lines of code
0:20
Conclusion
2:00
Thanks and goodbye
2:00
Take this course for FREE
Talk Python's Mastodon Michael Kennedy's Mastodon