Python Memory Management and Tips Transcripts
Chapter: Recovering memory in Python
Lecture: When does the GC run?
0:00 Now we've seen what this garbage collector is and how it restricts what it pays attention
0:05 to, both the types of objects that it pays attention to and the frequency which
0:09 it checks different ages or generations of objects.
0:15 But when exactly does it run?
0:17 Is it some odd heuristic that we don't really know about?
0:20 Or is there something more concrete?
0:22 Well, Python, unlike a lot of different systems,
0:25 is pretty straight forward in what it does. So we can actually call this function
0:30 "gc.get_threshold()". You'll get a tuple of numbers back.
0:34 You're gonna get 700, 10, and 10 in the current system.
0:38 Now what's confusing about this, what's unclear about this,
0:41 unless you go look at the documentation,
0:43 this is not how often Generation zero runs versus generation one versus generation two, the units
0:50 on these things are different, okay?
0:52 So the first one is different than the other two.
0:55 The first number, the 700,
0:57 This is a generation zero collection,
1:00 so one of the cheaper, easier collections.
1:03 This is triggered when the number of allocations surviving reference counting minus the ones that have
1:10 been cleaned up exceeds 700. That's what that 700 on the left means, it means
1:16 we've allocated 700 more things that have lived even when you take into account the ones
1:21 that have been deleted. Now Remember,
1:23 this is not all allocations. If that were the case,
1:25 this wouold run like crazy all the time.
1:27 This is going to run only when you've got container objects:
1:33 classes or dictionaries and so on.
1:37 If you create, let's say,
1:38 800 classes and only 50 of them get cleaned up well,
1:42 that could trigger a garbage collection.
1:43 Okay, so that's this first number.
1:45 That's the 700. The second number says we're going to trigger a Generation one collection
1:51 This is a little bit broader search of the memory space and can be more
1:56 expensive because we're looking at more objects potentially.
2:00 Now, this number 10 here means we're gonna do a Generation one collection for every
2:06 10 generation zeros. To write the generation zeros on,
2:09 we do that every 700 extra containers,
2:13 and then one in 10 of those we're gonna actually look a little broader,
2:17 and we're gonna get a generation one.
2:19 So that's what this 10 means. From generation zero to generation one,
2:22 the ratio of those is 10 to 1.
2:25 If we look at the final 10,
2:27 this is when a generation two collection is triggered,
2:31 and that's when the number of Generation one collections is greater than this number.
2:34 So for every 10 generation one collections,
2:38 there's a gen two collection. So that might sound a little bit complicated,
2:42 but let's break it down. For every time that we have this exceeding of some fixed
2:48 number of allocations of container objects,
2:49 700, that's going to generate a Gen 01, and then for every 10 generation
2:55 zeros Let's say, we're have 1 generation
2:57 one collection. And then every 10 generation one collections we're gonna have a generation two.
3:02 So it's a 1 to 10 to 100 from Gen zero to gen one to gen two
3:08 number of collections, and it just happens to be the thing that starts it all
3:12 off is the number of allocations of surviving container objects like classes and dictionaries and so
3:17 on. All right,
3:20 Hopefully, even though that's a little bit of a lot to keep in your mind
3:23 it gives you a sense of what's going on here.
3:27 Like, when is the this whole process going to get started?
3:30 It's pretty easy to see we've got 100 to a 10 to a 1 ratio for all
3:35 of these in the generations for the GC.
3:38 It's the allocation thing that kicks off the base of that whole process.
3:42 That's a little bit unclear. It's also worth pointing out that there's a
3:47 "gc.set_threshold()". So if you want to change these,
3:50 you can. Personally, I haven't really tried that,
3:53 but it seems like there might be some pretty interesting performance benefits you could get.
3:57 You know, I'm thinking of things like I go to a SQLalchemy model,
4:02 and I do a query against the database and that is going to return 1000 records,
4:07 that's gonna trigger a garbage collection.
4:11 But, you know, it's very unlikely those things are gonna have a cycle.
4:13 They were just created, right?
4:15 So, you could do things to say, maybe be in those situations where cycles are very
4:19 rare, you could kick up that base number.
4:21 It seems like something you might be able to play with.
4:23 We're gonna look at some examples of people doing way more insane stuff than that,
4:27 but, it seems like playing with that base number,
4:30 you could probably get some pretty interesting performance benefits or maybe drawbacks.
4:36 It depends, but you could definitely make a big impact by changing that first number,
4:39 that 700, to something bigger or smaller.