Getting started with Dask Transcripts
Chapter: Using the Dask DataFrame
Lecture: Sharing intermediate results

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In some cases, you can benefit from sharing intermediate results. We've learned that Dask computations are created as task graphs.
0:08 In other words, they are just plans. When two computations are related, they can use the same intermediate steps.
0:15 We can use that phenomenon to our advantage for shared efficiency. For example, when computing minimum and maximum values.
0:22 Let's look at an example here. Just like in Pandas, you can use min and max to compute the minimum and
0:27 maximum values in Dask. Without sharing is how we have been computing so far, executing each
0:34 task graph separately. If we time it to examine the difference, we can see very clearly that running the two computations together,
0:42 just like we do in the section with sharing saves us nearly half the time. To share intermediate results, we can pass both maximum and minimum,
0:51 tip_amount together to dask.compute. The shared computation is much faster because we save on loading data from disk and the groupby.
0:59 Similarly, if we have persisted the dataset into memory earlier, the entire process would be even faster yet.


Talk Python's Mastodon Michael Kennedy's Mastodon