Async Techniques and Examples in Python Transcripts
Chapter: Parallelism in C with Cython
Lecture: Demo: Fast threading with cython (app review)

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now that we know a little bit about Cython and how to use it, it's time to take the Cython power and apply it to concurrent parallel programming
0:09 apply it to threads, and you're going to see how Cython first just speeds up the code, period because it's C and not Python and then breaks us free
0:17 from the GIL so we can go even faster. There's just a handful of concepts that we got to learn to convert the code we've already written
0:24 to extremely high performance C code. Now let's go back to this computed useless math thing that we're doing.
0:31 Right here is our do_math, and remember, we have let me again put the little digit grouping things in here so you can tell more easily.
0:39 We're asking this little math function to increment an integer and to do some square roots and multiplication and subtraction 30 million times.
0:49 Now, turns out for 30 million times this is actually pretty fast. Let's run it. 7.6 seconds. Yeah, that's good. I mean, we did 30 million things
1:06 actually way more than that, so impressive but we can do it way faster. So this you've already seen, and we talked about this in the threading section
1:14 in the multiprocessing section, and so on. So we said, well, if we'd run this in a threaded mode
1:20 on 12 processors, remember how awesome this computer is? It has 12 cores. Maybe cores is a better word than processors.
1:25 Doesn't really matter. It should get faster. Oh, about the same. I threw in a little factor for these so you could tell based
1:33 on some standard, running this a few times how much faster or slower. So the threaded version's about the same.
1:39 This one, do the multiprocessing way better. Yes, it's 1.79 seconds, 4.7 times faster. That is a big improvement. Can it get better still?
1:50 So that's our goal. Our goal is to take this code. Actually, we're going to take the threaded version here
1:58 the threaded version, and that's using 12 threads 'cause there's 12 cores, and it's trying to run it
2:05 and we saw that this one actually was almost exactly the same speed as just the serial one. We saw with multiprocessing
2:11 that we can get 4.7 times faster but why again is this threading one not helping us? It's not helping because all of the execution
2:21 all the math work that we're doing is a series of interpreter instructions in CPython. The GIL tells us that one instruction at a time can run
2:29 period, regardless of how many threads or things like that are running. So this is clogged up. It can't go any faster with threading
2:36 but that's because of the GIL. What if we could use Cython to make this happen in C and break free from the GIL?
2:43 Well, it'd definitely be on, wouldn't it? And it turns out we are going to do that next.

Talk Python's Mastodon Michael Kennedy's Mastodon