Async Techniques and Examples in Python Transcripts
Chapter: Threads
Lecture: Demo: Attempting to leverage multiple cores with threads

Login or purchase this course to watch this video and the rest of the course contents.
0:00 At the opening of this chapter I told you that threads belonged in the do more at once but not in the do stuff faster
0:07 by taking advantage of the multiple cores on our system. And recall, on this computer here I have many, many cores, 12 cores up there.
0:19 So I should be able to take advantage of all of those cores to do computational things way, way faster. In a perfectly parallel universe
0:30 I should be getting 11 to 12 times improvement. So here I have a computational little program. I have the original
0:40 and then this one, which starts as original but we're going to evolve it as we have been. And it has this function, do_math
0:46 which is just going to do some multiplication and square roots and things like that. Didn't really matter what math it's doing.
0:54 Just that it is doing math is pretty interesting. So let's run this singular serial version. It takes about 7.5 seconds to run.
1:08 7.47. Excuse me for my inaccuracy. So it's done. It's done its work. If I was able to leverage this in some sort of threading way
1:19 I would be really golden. I could make it go in half a second or something really great. So let me replace that line with an interesting little thing
1:29 and the details are not super important so I'm just going to paste it for time sake. So what we're going to do is
1:33 we're going to create a bunch of threads. We're going to start the threads and join on them. We've already seen that this pattern is super common here.
1:39 And I'm going to import threads at the top. And instead of just doing math on how many is that? Let's make that number more obvious.
1:52 On 30 million operations. We're going to partition this across n different processors from 1 up to processor count, alright?
2:08 And why are we using processor count and not just 5, 10, 100, whatever? It turns out when you're doing computational stuff
2:14 having more CPU-busy threads fighting for attention on the CPU itself actually slows it down. One thread runs for a little bit
2:24 works with some variables and the other gets run by the operating system and kicks that thread off that processor.
2:29 It needs different data which expels stuff out of the cache which means memory access is slower for a second
2:35 and then it speeds up again. Things like that. So having too much contention for the processors in a CPU world is bad, bad news.
2:42 Ideally, if nothing else was happening on the computer targeting the number of processors would be the exact right amount.
2:49 Well, except for that in Python it's not going to help, as we'll see. But how did we get this number? We're going to have to bring in a new
2:57 module here that we haven't seen yet. So we'll have to import multiprocessing. Now, multiprocessing is process equivalent of threads in Python.
3:08 We're going to talk about it later. But it does have a cool little function: CPU count. There we go. And let's just do a quick print.
3:17 Whatever your machine tells you is how many processors it thinks it have. Print that out. And if you want to be really ambitious
3:25 we could put a comma there for when that goes over 1000. Anyway, let's just run it. Doing math on 12 processors.
3:33 Should be half a second, or one second, something like that. Go, go, go, faster. Ah, that's a tiny bit faster. That's kind of cool, actually.
3:42 Don't know how it got to be a tiny bit faster. Maybe because there's more threads. Actually fought against Camtasia which is recording my screen.
3:49 Get a little more CPU time. Yeah, it does make it a tiny bit faster. But, you know, that ratio of 6.34 over 7.47 so that 15% speedup there.
4:01 That is not super impressive, is it? So given that I have 12 cores we should have been able to do better than 15%.
4:10 And I honestly think if I were to turn off Camtasia maybe it would actually not make any difference at all. I don't know.
4:15 But here's an example of where the GIL is raising its head and causing serious, serious problems. So this, all of this work, is being done in Python
4:27 in the Python interpreter, so every step of the way there's no sort of blocking waiting point it's just trying to do work.
4:32 And the interpreter, 'cause of the GIL can only operate on one instruction at a time regardless of which threads or how many threads are running.
4:40 That means this basically is parallel even though I gave it 12 threads to run. What's the fix? The fix is this problem is not solved with threads.
4:50 This problem is solved some other way. I'll show you two ways in this course to make that number go down. I'll show you how to use multiprocessing
4:58 to get that number under a second and I'll show you how to use Cython to get that number incredibly small. But the way it is right now
5:07 it's not going to do a lot for us. We just can't make it go really any faster because the GIL is blocking any of the parallels
5:15 that we're trying to take advantage of. Alright, so here's a concrete example of where the GIL is inhibiting threading.
5:22 And of course, asyncio has even a worse problem, right? Remember, asyncio doesn't even have threads. It's all one thread, so there's not even a hope
5:30 that you could have gotten anything better out of asyncio. But threading, some languages like C or C# this would have gone much faster.
5:38 But because of the GIL, in this particular case this is where the GIL hurts you and you just don't get any extra parallelism, really.

Talk Python's Mastodon Michael Kennedy's Mastodon