Python Memory Management and Tips Transcripts
Chapter: Recovering memory in Python
Lecture: Do you need the GC?
0:00 If we go over to the CPython documentation for almost the latest version,
0:04 we're doing 3.8.5 but here's 3.8.4 and you look up right at the top of the
0:09 garbage collector module, you're going to see some very interesting stuff that I've highlighted here.
0:14 It starts out telling you about it.
0:16 It provides an interface to the optional garbage collector,
0:20 and it provides the ability to disable it, to tune the frequency,
0:23 which is what we just spoke about, add debugging options and so on.
0:28 It also lets you ask which things are unreachable but cannot be freed for various reasons
0:33 and so on. But check out this underlying thing.
0:36 You can disable the garbage collector if you're sure your program does not create reference cycles.
0:41 Automatic collection can be disabled simply by calling "gc.disable()".
0:46 How crazy is that? So if you have a program that you're sure doesn't create
0:50 cycles or honestly, if it doesn't create too many cycles,
0:54 right? If this is like a command line script and it runs for two seconds,
0:58 it doesn't create a lot of cycles
1:00 even if it leaks a little memory,
1:02 who cares? if it's a Web server
1:04 that runs for ever, maybe that's a problem.
1:06 Maybe not. We will see.
1:08 But to me, it is super interesting that this garbage collector is considered optional and that
1:13 right at the top of the documentation,
1:16 it's like "you know what?
1:17 If you're feeling confident, turn it off,
1:19 you might not need it". So that's pretty cool.
1:23 It also says here "to debug a leaky program call
1:25 "gc.set_debug(gc.DEBUG_LEAK)", and this includes
1:30 DUBUG_SAVEALL causing garbage collected objects to be saved in some place in memory
1:37 rather than been cleaned up for inspection".
1:39 So that's also interesting, but this ability to disable it,
1:43 this is intriguing. I wonder if you could do it.
1:46 If anybody would, What would the outcome be?
1:49 Well, you might have heard of this place called "Instagram".
1:51 I think they do something with photos.
1:52 They actually do an insane amount of work with Python.
1:56 All of Instagram runs on Django,
2:00 at least their back end API's and their website and so on.
2:03 And I think they're one of the largest deployments of Django in the world.
2:06 They've got a massive set of servers and so on.
2:09 They wrote this article over on their engineering blog,
2:11 which they have a lot of cool Python stuff
2:13 They talk about, called "Dismissing Python Garbage Collection at Instagram".
2:18 And it's pretty intriguing, it says,
2:20 "by dismissing the Python garbage collection mechanism,
2:24 which reclaims memory by collecting and freeing unused data,
2:27 Instagram can run 10% more efficiently".
2:30 Yes, you heard it, by disabling GC entirely,
2:33 we can reduce the memory footprint,
2:35 not increase it, reduce it by,
2:38 I think they said, maybe 25% or something like that. Quite a bit, and improve
2:44 the runtime performance by improving the CPU LLC
2:47 cache hit ratio, and you want to know more why you can check out this article
2:52 here at the bottom. It's probably better just Google it.
2:55 It's one of these yucky medium URL's,
2:57 but nonetheless, quite, quite interesting. So here's the
3:02 DLDR version. So they were able to determine that the way web servers
3:07 work is they'll create not just one version of the server for running your Python code
3:14 but they'll make many of them. So
3:16 for example, at Talk Python,
3:17 we use micro-whiskey, and when we run it for the training site,
3:22 we actually have eight copies of that process running. Eight independent,
3:27 separate copies of the Python web app that you very likely are using in some form
3:31 or another right now. There's a lot of memory that's shared between those things
3:36 and the operating system is pretty good at saying "we're only going to consume more
3:42 memory for all these different processes if they're going to start changing it,
3:45 but if it's actually just the same,
3:47 let's just point them all at the same bit of memory"
3:49 okay? When that's the case,
3:52 you actually get a lower memory usage because even though we have eight processes, instead of
3:57 having eight times the memory, we might have,
3:59 you know, 10% - 20% extra memory that has to be created,
4:03 and 80% could be shared. I don't know if that's the actual ratio,
4:06 but you know, that's the general idea that a
4:08 lot of the core startup runtime bits are all the same,
4:12 and then there's what that process is done this particular time since it started.
4:17 Okay, so what they found by disabling the GC,
4:20 it was actually mucking with the memory in a way that would actually not allow that
4:26 memory to be shared by the operating system.
4:28 So even though they may have had a few cycles that created,
4:31 you know, some issues for them,
4:33 what they found was they got,
4:35 I think, they said 25% reduction in memory usage.
4:38 So they saved 8GB per server
4:41 by turning off the GC. And also because the memory is more similar across these
4:46 different processes, it's more likely that as different processes process requests,
4:52 that data is going to be in the cache, the CPU cache, and CPU
4:57 cache access is much faster. So a rule of thumb
5:02 might be "if I'm going to read from disk versus something from memory, could 200-
5:07 400 times faster to read it from RAM than it is from disk".
5:11 So obviously you think something in RAM is blazing and something on disk,
5:16 even a fast disk, is relatively slow.
5:18 Same for the cache, though.
5:20 That cache is like 400 times faster than RAM.
5:23 So if you can get more of these cache
5:25 hits, you can get your code to run much faster.
5:28 So they came up with this 10% number,
5:30 sort of like a CPU performance boost plus memory reduction so we can run more
5:35 things on the same server and so on.
5:37 The number is not really important, the general idea is,
5:39 and the fact that they were able to apply this is pretty interesting.
5:42 But if you want to do this,
5:44 you should read the article because it's not straight forward how they did it or whether
5:49 that worked well for them. So, depending on what you're trying to do,
5:52 it might be as simple as calling "gc.disable()",
5:55 but read the article and you'll see there's actually more to what they had to do
5:58 in their fairly complicated set up.
6:01 All that said, I'd probably leave the garbage collector on, maybe turned that first
6:05 number, that 700, up much higher.
6:08 But I know that it's kind of my first impression as I'm thinking through these problems,
6:11 But, you know, this is one of the types of things that maybe
6:15 let it just work the way it is.
6:16 But if you feel like you could give this boost,
6:18 these are some of the knobs and ideas that you can play with to improve it
6:22 pretty easily around garbage collection and memory management.