Async Techniques and Examples in Python Transcripts
Chapter: Threads
Lecture: Demo: Attempting to leverage multiple cores with threads

Login or purchase this course to watch this video and the rest of the course contents.
0:00 At the opening of this chapter
0:01 I told you that threads belonged in
0:03 the do more at once
0:04 but not in the do stuff faster
0:06 by taking advantage of the multiple cores on our system.
0:09 And recall, on this computer here
0:15 I have many, many cores, 12 cores up there.
0:18 So I should be able to take advantage of all of those cores
0:23 to do computational things way, way faster.
0:27 In a perfectly parallel universe
0:29 I should be getting 11 to 12 times improvement.
0:34 So here I have a computational little program.
0:38 I have the original
0:39 and then this one, which starts as original
0:41 but we're going to evolve it as we have been.
0:43 And it has this function, do_math
0:45 which is just going to do some multiplication
0:49 and square roots and things like that.
0:51 Didn't really matter what math it's doing.
0:53 Just that it is doing math is pretty interesting.
0:56 So let's run this singular serial version.
0:59 It takes about 7.5 seconds to run.
1:07 7.47. Excuse me for my inaccuracy.
1:09 So it's done. It's done its work.
1:13 If I was able to leverage this in some sort of threading way
1:18 I would be really golden.
1:19 I could make it go in half a second
1:22 or something really great.
1:23 So let me replace that line with an interesting little thing
1:28 and the details are not super important
1:29 so I'm just going to paste it for time sake.
1:31 So what we're going to do is
1:32 we're going to create a bunch of threads.
1:33 We're going to start the threads and join on them.
1:36 We've already seen that this pattern is super common here.
1:38 And I'm going to import threads at the top.
1:42 And instead of just doing math on how many is that?
1:47 Let's make that number more obvious.
1:51 On 30 million operations.
1:57 We're going to partition this across n different processors
2:02 from 1 up to processor count, alright?
2:07 And why are we using processor count
2:08 and not just 5, 10, 100, whatever?
2:11 It turns out when you're doing computational stuff
2:13 having more CPU-busy threads fighting for attention
2:18 on the CPU itself actually slows it down.
2:21 One thread runs for a little bit
2:23 works with some variables
2:24 and the other gets run by the operating system
2:26 and kicks that thread off that processor.
2:28 It needs different data which expels stuff out of the cache
2:31 which means memory access is slower for a second
2:34 and then it speeds up again. Things like that.
2:36 So having too much contention for the processors
2:38 in a CPU world is bad, bad news.
2:41 Ideally, if nothing else was happening on the computer
2:43 targeting the number of processors
2:45 would be the exact right amount.
2:48 Well, except for that in Python
2:50 it's not going to help, as we'll see.
2:51 But how did we get this number?
2:52 We're going to have to bring in a new
2:56 module here that we haven't seen yet.
2:58 So we'll have to import multiprocessing.
3:02 Now, multiprocessing is process
3:04 equivalent of threads in Python.
3:07 We're going to talk about it later.
3:08 But it does have a cool little function: CPU count.
3:13 There we go. And let's just do a quick print.
3:16 Whatever your machine tells you is
3:18 how many processors it thinks it have.
3:21 Print that out.
3:22 And if you want to be really ambitious
3:24 we could put a comma there for when that goes over 1000.
3:28 Anyway, let's just run it.
3:30 Doing math on 12 processors.
3:32 Should be half a second, or one second, something like that.
3:36 Go, go, go, faster. Ah, that's a tiny bit faster.
3:40 That's kind of cool, actually.
3:41 Don't know how it got to be a tiny bit faster.
3:43 Maybe because there's more threads.
3:44 Actually fought against Camtasia
3:47 which is recording my screen.
3:48 Get a little more CPU time.
3:49 Yeah, it does make it a tiny bit faster.
3:51 But, you know, that ratio of 6.34 over 7.47
3:57 so that 15% speedup there.
4:00 That is not super impressive, is it?
4:03 So given that I have 12 cores
4:05 we should have been able to do better than 15%.
4:09 And I honestly think if I were to turn off Camtasia
4:11 maybe it would actually not make any difference at all.
4:13 I don't know.
4:14 But here's an example of where the GIL is raising its head
4:20 and causing serious, serious problems.
4:22 So this, all of this work, is being done in Python
4:26 in the Python interpreter, so every step of the way
4:28 there's no sort of blocking waiting point
4:30 it's just trying to do work.
4:31 And the interpreter, 'cause of the GIL
4:34 can only operate on one instruction at a time
4:37 regardless of which threads or how many threads are running.
4:39 That means this basically is parallel
4:42 even though I gave it 12 threads to run.
4:44 What's the fix?
4:46 The fix is this problem is not solved with threads.
4:49 This problem is solved some other way.
4:51 I'll show you two ways in this course
4:53 to make that number go down.
4:55 I'll show you how to use multiprocessing
4:57 to get that number under a second
4:59 and I'll show you how to use Cython
5:02 to get that number incredibly small.
5:05 But the way it is right now
5:06 it's not going to do a lot for us.
5:08 We just can't make it go really any faster
5:11 because the GIL is blocking any of the parallels
5:14 that we're trying to take advantage of.
5:16 Alright, so here's a concrete example
5:17 of where the GIL is inhibiting threading.
5:21 And of course, asyncio has even a worse problem, right?
5:24 Remember, asyncio doesn't even have threads.
5:26 It's all one thread, so there's not even a hope
5:29 that you could have gotten
5:30 anything better out of asyncio.
5:32 But threading, some languages like C or C#
5:36 this would have gone much faster.
5:37 But because of the GIL, in this particular case
5:41 this is where the GIL hurts you
5:43 and you just don't get any extra parallelism, really.