Python Memory Management and Tips Transcripts
Chapter: Recovering memory in Python
Lecture: Do you need the GC?
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
If we go over to the CPython documentation for almost the latest version, we're doing 3.8.5 but here's 3.8.4 and you look up right at the top of the
0:10
garbage collector module, you're going to see some very interesting stuff that I've highlighted here. It starts out telling you about it.
0:17
It provides an interface to the optional garbage collector, and it provides the ability to disable it, to tune the frequency,
0:24
which is what we just spoke about, add debugging options and so on.
0:29
It also lets you ask which things are unreachable but cannot be freed for various reasons and so on. But check out this underlying thing.
0:37
You can disable the garbage collector if you're sure your program does not create reference cycles.
0:42
Automatic collection can be disabled simply by calling "gc.disable()". How crazy is that? So if you have a program that you're sure doesn't create
0:51
cycles or honestly, if it doesn't create too many cycles, right? If this is like a command line script and it runs for two seconds,
0:59
it doesn't create a lot of cycles even if it leaks a little memory, who cares? if it's a Web server that runs for ever, maybe that's a problem.
1:07
Maybe not. We will see. But to me, it is super interesting that this garbage collector is considered optional and that
1:14
right at the top of the documentation, it's like "you know what? If you're feeling confident, turn it off,
1:20
you might not need it". So that's pretty cool. It also says here "to debug a leaky program call "gc.set_debug(gc.DEBUG_LEAK)", and this includes
1:31
DUBUG_SAVEALL causing garbage collected objects to be saved in some place in memory rather than been cleaned up for inspection".
1:40
So that's also interesting, but this ability to disable it, this is intriguing. I wonder if you could do it.
1:47
If anybody would, What would the outcome be? Well, you might have heard of this place called "Instagram". I think they do something with photos.
1:53
They actually do an insane amount of work with Python. All of Instagram runs on Django, at least their back end API's and their website and so on.
2:04
And I think they're one of the largest deployments of Django in the world. They've got a massive set of servers and so on.
2:10
They wrote this article over on their engineering blog, which they have a lot of cool Python stuff
2:14
They talk about, called "Dismissing Python Garbage Collection at Instagram". And it's pretty intriguing, it says,
2:21
"by dismissing the Python garbage collection mechanism, which reclaims memory by collecting and freeing unused data,
2:28
Instagram can run 10% more efficiently". Yes, you heard it, by disabling GC entirely, we can reduce the memory footprint,
2:36
not increase it, reduce it by, I think they said, maybe 25% or something like that. Quite a bit, and improve
2:45
the runtime performance by improving the CPU LLC cache hit ratio, and you want to know more why you can check out this article
2:53
here at the bottom. It's probably better just Google it. It's one of these yucky medium URL's, but nonetheless, quite, quite interesting. So here's the
3:03
DLDR version. So they were able to determine that the way web servers
3:08
work is they'll create not just one version of the server for running your Python code but they'll make many of them. So for example, at Talk Python,
3:18
we use micro-whiskey, and when we run it for the training site, we actually have eight copies of that process running. Eight independent,
3:28
separate copies of the Python web app that you very likely are using in some form
3:32
or another right now. There's a lot of memory that's shared between those things
3:37
and the operating system is pretty good at saying "we're only going to consume more
3:43
memory for all these different processes if they're going to start changing it, but if it's actually just the same,
3:48
let's just point them all at the same bit of memory" okay? When that's the case,
3:53
you actually get a lower memory usage because even though we have eight processes, instead of having eight times the memory, we might have,
4:00
you know, 10% - 20% extra memory that has to be created, and 80% could be shared. I don't know if that's the actual ratio,
4:07
but you know, that's the general idea that a lot of the core startup runtime bits are all the same,
4:13
and then there's what that process is done this particular time since it started. Okay, so what they found by disabling the GC,
4:21
it was actually mucking with the memory in a way that would actually not allow that memory to be shared by the operating system.
4:29
So even though they may have had a few cycles that created, you know, some issues for them, what they found was they got,
4:36
I think, they said 25% reduction in memory usage. So they saved 8GB per server
4:42
by turning off the GC. And also because the memory is more similar across these
4:47
different processes, it's more likely that as different processes process requests, that data is going to be in the cache, the CPU cache, and CPU
4:58
cache access is much faster. So a rule of thumb might be "if I'm going to read from disk versus something from memory, could 200-
5:08
400 times faster to read it from RAM than it is from disk". So obviously you think something in RAM is blazing and something on disk,
5:17
even a fast disk, is relatively slow. Same for the cache, though. That cache is like 400 times faster than RAM. So if you can get more of these cache
5:26
hits, you can get your code to run much faster. So they came up with this 10% number,
5:31
sort of like a CPU performance boost plus memory reduction so we can run more things on the same server and so on.
5:38
The number is not really important, the general idea is, and the fact that they were able to apply this is pretty interesting.
5:43
But if you want to do this, you should read the article because it's not straight forward how they did it or whether
5:50
that worked well for them. So, depending on what you're trying to do, it might be as simple as calling "gc.disable()",
5:56
but read the article and you'll see there's actually more to what they had to do in their fairly complicated set up.
6:02
All that said, I'd probably leave the garbage collector on, maybe turned that first number, that 700, up much higher.
6:09
But I know that it's kind of my first impression as I'm thinking through these problems, But, you know, this is one of the types of things that maybe
6:16
let it just work the way it is. But if you feel like you could give this boost,
6:19
these are some of the knobs and ideas that you can play with to improve it pretty easily around garbage collection and memory management.