Python Memory Management and Tips Transcripts
Chapter: Recovering memory in Python
Lecture: When does the GC run?
0:00 Now we've seen what this garbage collector is and how it restricts what it pays attention
0:06 to, both the types of objects that it pays attention to and the frequency which it checks different ages or generations of objects.
0:16 But when exactly does it run? Is it some odd heuristic that we don't really know about? Or is there something more concrete?
0:23 Well, Python, unlike a lot of different systems, is pretty straight forward in what it does. So we can actually call this function
0:31 "gc.get_threshold()". You'll get a tuple of numbers back. You're gonna get 700, 10, and 10 in the current system.
0:39 Now what's confusing about this, what's unclear about this, unless you go look at the documentation,
0:44 this is not how often Generation zero runs versus generation one versus generation two, the units on these things are different, okay?
0:53 So the first one is different than the other two. The first number, the 700, This is a generation zero collection,
1:01 so one of the cheaper, easier collections. This is triggered when the number of allocations surviving reference counting minus the ones that have
1:11 been cleaned up exceeds 700. That's what that 700 on the left means, it means
1:17 we've allocated 700 more things that have lived even when you take into account the ones that have been deleted. Now Remember,
1:24 this is not all allocations. If that were the case, this wouold run like crazy all the time.
1:28 This is going to run only when you've got container objects: classes or dictionaries and so on. If you create, let's say,
1:39 800 classes and only 50 of them get cleaned up well, that could trigger a garbage collection. Okay, so that's this first number.
1:46 That's the 700. The second number says we're going to trigger a Generation one collection
1:52 This is a little bit broader search of the memory space and can be more expensive because we're looking at more objects potentially.
2:01 Now, this number 10 here means we're gonna do a Generation one collection for every 10 generation zeros. To write the generation zeros on,
2:10 we do that every 700 extra containers, and then one in 10 of those we're gonna actually look a little broader, and we're gonna get a generation one.
2:20 So that's what this 10 means. From generation zero to generation one, the ratio of those is 10 to 1. If we look at the final 10,
2:28 this is when a generation two collection is triggered, and that's when the number of Generation one collections is greater than this number.
2:35 So for every 10 generation one collections, there's a gen two collection. So that might sound a little bit complicated,
2:43 but let's break it down. For every time that we have this exceeding of some fixed number of allocations of container objects,
2:50 700, that's going to generate a Gen 01, and then for every 10 generation zeros Let's say, we're have 1 generation
2:58 one collection. And then every 10 generation one collections we're gonna have a generation two.
3:03 So it's a 1 to 10 to 100 from Gen zero to gen one to gen two number of collections, and it just happens to be the thing that starts it all
3:13 off is the number of allocations of surviving container objects like classes and dictionaries and so on. All right,
3:21 Hopefully, even though that's a little bit of a lot to keep in your mind it gives you a sense of what's going on here.
3:28 Like, when is the this whole process going to get started? It's pretty easy to see we've got 100 to a 10 to a 1 ratio for all
3:36 of these in the generations for the GC. It's the allocation thing that kicks off the base of that whole process.
3:43 That's a little bit unclear. It's also worth pointing out that there's a "gc.set_threshold()". So if you want to change these,
3:51 you can. Personally, I haven't really tried that, but it seems like there might be some pretty interesting performance benefits you could get.
3:58 You know, I'm thinking of things like I go to a SQLalchemy model, and I do a query against the database and that is going to return 1000 records,
4:08 that's gonna trigger a garbage collection. But, you know, it's very unlikely those things are gonna have a cycle. They were just created, right?
4:16 So, you could do things to say, maybe be in those situations where cycles are very rare, you could kick up that base number.
4:22 It seems like something you might be able to play with. We're gonna look at some examples of people doing way more insane stuff than that,
4:28 but, it seems like playing with that base number, you could probably get some pretty interesting performance benefits or maybe drawbacks.
4:37 It depends, but you could definitely make a big impact by changing that first number, that 700, to something bigger or smaller.