Fundamentals of Dask Transcripts
Chapter: Dask Bag
Lecture: Dask bag limitations
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
To wrap up on Dask Bag, I'd like to tell you about some of the limitations of Dask Bag.
0:06
So, firstly Dask Bag doesn't always perform that well on computations that include inter worker communication
0:11
which is due to restrictions in the default multi processing scheduler and we'll see this in the next chapter. On top of that bag,
0:19
operations are slower than 'Array Data Frame computations' as we saw in the previous video does
0:26
Python, of course, is slower than 'NumPy' or 'Pandas' for these types of operations 'groupby' is slow and you should use 'foldby' if possible.
0:34
As we've also already discussed. On top of this, note that Bags are immutable and so you cannot change individual elements.
0:43
Now, if you're excited by Bags and want to use them for your work,
0:47
we've provided a list of references in the notebook and I'd also encourage you to check
0:52
out the wonderful Dask documentation that the open source community has built for us.