Async Techniques and Examples in Python Transcripts
Chapter: Threads
Lecture: Demo: Attempting to leverage multiple cores with threads
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
At the opening of this chapter I told you that threads belonged in the do more at once but not in the do stuff faster
0:07
by taking advantage of the multiple cores on our system. And recall, on this computer here I have many, many cores, 12 cores up there.
0:19
So I should be able to take advantage of all of those cores to do computational things way, way faster. In a perfectly parallel universe
0:30
I should be getting 11 to 12 times improvement. So here I have a computational little program. I have the original
0:40
and then this one, which starts as original but we're going to evolve it as we have been. And it has this function, do_math
0:46
which is just going to do some multiplication and square roots and things like that. Didn't really matter what math it's doing.
0:54
Just that it is doing math is pretty interesting. So let's run this singular serial version. It takes about 7.5 seconds to run.
1:08
7.47. Excuse me for my inaccuracy. So it's done. It's done its work. If I was able to leverage this in some sort of threading way
1:19
I would be really golden. I could make it go in half a second or something really great. So let me replace that line with an interesting little thing
1:29
and the details are not super important so I'm just going to paste it for time sake. So what we're going to do is
1:33
we're going to create a bunch of threads. We're going to start the threads and join on them. We've already seen that this pattern is super common here.
1:39
And I'm going to import threads at the top. And instead of just doing math on how many is that? Let's make that number more obvious.
1:52
On 30 million operations. We're going to partition this across n different processors from 1 up to processor count, alright?
2:08
And why are we using processor count and not just 5, 10, 100, whatever? It turns out when you're doing computational stuff
2:14
having more CPU-busy threads fighting for attention on the CPU itself actually slows it down. One thread runs for a little bit
2:24
works with some variables and the other gets run by the operating system and kicks that thread off that processor.
2:29
It needs different data which expels stuff out of the cache which means memory access is slower for a second
2:35
and then it speeds up again. Things like that. So having too much contention for the processors in a CPU world is bad, bad news.
2:42
Ideally, if nothing else was happening on the computer targeting the number of processors would be the exact right amount.
2:49
Well, except for that in Python it's not going to help, as we'll see. But how did we get this number? We're going to have to bring in a new
2:57
module here that we haven't seen yet. So we'll have to import multiprocessing. Now, multiprocessing is process equivalent of threads in Python.
3:08
We're going to talk about it later. But it does have a cool little function: CPU count. There we go. And let's just do a quick print.
3:17
Whatever your machine tells you is how many processors it thinks it have. Print that out. And if you want to be really ambitious
3:25
we could put a comma there for when that goes over 1000. Anyway, let's just run it. Doing math on 12 processors.
3:33
Should be half a second, or one second, something like that. Go, go, go, faster. Ah, that's a tiny bit faster. That's kind of cool, actually.
3:42
Don't know how it got to be a tiny bit faster. Maybe because there's more threads. Actually fought against Camtasia which is recording my screen.
3:49
Get a little more CPU time. Yeah, it does make it a tiny bit faster. But, you know, that ratio of 6.34 over 7.47 so that 15% speedup there.
4:01
That is not super impressive, is it? So given that I have 12 cores we should have been able to do better than 15%.
4:10
And I honestly think if I were to turn off Camtasia maybe it would actually not make any difference at all. I don't know.
4:15
But here's an example of where the GIL is raising its head and causing serious, serious problems. So this, all of this work, is being done in Python
4:27
in the Python interpreter, so every step of the way there's no sort of blocking waiting point it's just trying to do work.
4:32
And the interpreter, 'cause of the GIL can only operate on one instruction at a time regardless of which threads or how many threads are running.
4:40
That means this basically is parallel even though I gave it 12 threads to run. What's the fix? The fix is this problem is not solved with threads.
4:50
This problem is solved some other way. I'll show you two ways in this course to make that number go down. I'll show you how to use multiprocessing
4:58
to get that number under a second and I'll show you how to use Cython to get that number incredibly small. But the way it is right now
5:07
it's not going to do a lot for us. We just can't make it go really any faster because the GIL is blocking any of the parallels
5:15
that we're trying to take advantage of. Alright, so here's a concrete example of where the GIL is inhibiting threading.
5:22
And of course, asyncio has even a worse problem, right? Remember, asyncio doesn't even have threads. It's all one thread, so there's not even a hope
5:30
that you could have gotten anything better out of asyncio. But threading, some languages like C or C# this would have gone much faster.
5:38
But because of the GIL, in this particular case this is where the GIL hurts you and you just don't get any extra parallelism, really.