Async Techniques and Examples in Python Transcripts
Chapter: Threads
Lecture: Demo: Attempting to leverage multiple cores with threads
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
At the opening of this chapter
0:01
I told you that threads belonged in
0:03
the do more at once
0:04
but not in the do stuff faster
0:06
by taking advantage of the multiple cores on our system.
0:09
And recall, on this computer here
0:15
I have many, many cores, 12 cores up there.
0:18
So I should be able to take advantage of all of those cores
0:23
to do computational things way, way faster.
0:27
In a perfectly parallel universe
0:29
I should be getting 11 to 12 times improvement.
0:34
So here I have a computational little program.
0:38
I have the original
0:39
and then this one, which starts as original
0:41
but we're going to evolve it as we have been.
0:43
And it has this function, do_math
0:45
which is just going to do some multiplication
0:49
and square roots and things like that.
0:51
Didn't really matter what math it's doing.
0:53
Just that it is doing math is pretty interesting.
0:56
So let's run this singular serial version.
0:59
It takes about 7.5 seconds to run.
1:07
7.47. Excuse me for my inaccuracy.
1:09
So it's done. It's done its work.
1:13
If I was able to leverage this in some sort of threading way
1:18
I would be really golden.
1:19
I could make it go in half a second
1:22
or something really great.
1:23
So let me replace that line with an interesting little thing
1:28
and the details are not super important
1:29
so I'm just going to paste it for time sake.
1:31
So what we're going to do is
1:32
we're going to create a bunch of threads.
1:33
We're going to start the threads and join on them.
1:36
We've already seen that this pattern is super common here.
1:38
And I'm going to import threads at the top.
1:42
And instead of just doing math on how many is that?
1:47
Let's make that number more obvious.
1:51
On 30 million operations.
1:57
We're going to partition this across n different processors
2:02
from 1 up to processor count, alright?
2:07
And why are we using processor count
2:08
and not just 5, 10, 100, whatever?
2:11
It turns out when you're doing computational stuff
2:13
having more CPU-busy threads fighting for attention
2:18
on the CPU itself actually slows it down.
2:21
One thread runs for a little bit
2:23
works with some variables
2:24
and the other gets run by the operating system
2:26
and kicks that thread off that processor.
2:28
It needs different data which expels stuff out of the cache
2:31
which means memory access is slower for a second
2:34
and then it speeds up again. Things like that.
2:36
So having too much contention for the processors
2:38
in a CPU world is bad, bad news.
2:41
Ideally, if nothing else was happening on the computer
2:43
targeting the number of processors
2:45
would be the exact right amount.
2:48
Well, except for that in Python
2:50
it's not going to help, as we'll see.
2:51
But how did we get this number?
2:52
We're going to have to bring in a new
2:56
module here that we haven't seen yet.
2:58
So we'll have to import multiprocessing.
3:02
Now, multiprocessing is process
3:04
equivalent of threads in Python.
3:07
We're going to talk about it later.
3:08
But it does have a cool little function: CPU count.
3:13
There we go. And let's just do a quick print.
3:16
Whatever your machine tells you is
3:18
how many processors it thinks it have.
3:21
Print that out.
3:22
And if you want to be really ambitious
3:24
we could put a comma there for when that goes over 1000.
3:28
Anyway, let's just run it.
3:30
Doing math on 12 processors.
3:32
Should be half a second, or one second, something like that.
3:36
Go, go, go, faster. Ah, that's a tiny bit faster.
3:40
That's kind of cool, actually.
3:41
Don't know how it got to be a tiny bit faster.
3:43
Maybe because there's more threads.
3:44
Actually fought against Camtasia
3:47
which is recording my screen.
3:48
Get a little more CPU time.
3:49
Yeah, it does make it a tiny bit faster.
3:51
But, you know, that ratio of 6.34 over 7.47
3:57
so that 15% speedup there.
4:00
That is not super impressive, is it?
4:03
So given that I have 12 cores
4:05
we should have been able to do better than 15%.
4:09
And I honestly think if I were to turn off Camtasia
4:11
maybe it would actually not make any difference at all.
4:13
I don't know.
4:14
But here's an example of where the GIL is raising its head
4:20
and causing serious, serious problems.
4:22
So this, all of this work, is being done in Python
4:26
in the Python interpreter, so every step of the way
4:28
there's no sort of blocking waiting point
4:30
it's just trying to do work.
4:31
And the interpreter, 'cause of the GIL
4:34
can only operate on one instruction at a time
4:37
regardless of which threads or how many threads are running.
4:39
That means this basically is parallel
4:42
even though I gave it 12 threads to run.
4:44
What's the fix?
4:46
The fix is this problem is not solved with threads.
4:49
This problem is solved some other way.
4:51
I'll show you two ways in this course
4:53
to make that number go down.
4:55
I'll show you how to use multiprocessing
4:57
to get that number under a second
4:59
and I'll show you how to use Cython
5:02
to get that number incredibly small.
5:05
But the way it is right now
5:06
it's not going to do a lot for us.
5:08
We just can't make it go really any faster
5:11
because the GIL is blocking any of the parallels
5:14
that we're trying to take advantage of.
5:16
Alright, so here's a concrete example
5:17
of where the GIL is inhibiting threading.
5:21
And of course, asyncio has even a worse problem, right?
5:24
Remember, asyncio doesn't even have threads.
5:26
It's all one thread, so there's not even a hope
5:29
that you could have gotten
5:30
anything better out of asyncio.
5:32
But threading, some languages like C or C#
5:36
this would have gone much faster.
5:37
But because of the GIL, in this particular case
5:41
this is where the GIL hurts you
5:43
and you just don't get any extra parallelism, really.