Python Memory Management and Tips Transcripts
Chapter: Recovering memory in Python
Lecture: When does the GC run?
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now we've seen what this garbage collector is and how it restricts what it pays attention
0:05
to, both the types of objects that it pays attention to and the frequency which
0:09
it checks different ages or generations of objects.
0:15
But when exactly does it run?
0:17
Is it some odd heuristic that we don't really know about?
0:20
Or is there something more concrete?
0:22
Well, Python, unlike a lot of different systems,
0:25
is pretty straight forward in what it does. So we can actually call this function
0:30
"gc.get_threshold()". You'll get a tuple of numbers back.
0:34
You're gonna get 700, 10, and 10 in the current system.
0:38
Now what's confusing about this, what's unclear about this,
0:41
unless you go look at the documentation,
0:43
this is not how often Generation zero runs versus generation one versus generation two, the units
0:50
on these things are different, okay?
0:52
So the first one is different than the other two.
0:55
The first number, the 700,
0:57
This is a generation zero collection,
1:00
so one of the cheaper, easier collections.
1:03
This is triggered when the number of allocations surviving reference counting minus the ones that have
1:10
been cleaned up exceeds 700. That's what that 700 on the left means, it means
1:16
we've allocated 700 more things that have lived even when you take into account the ones
1:21
that have been deleted. Now Remember,
1:23
this is not all allocations. If that were the case,
1:25
this wouold run like crazy all the time.
1:27
This is going to run only when you've got container objects:
1:33
classes or dictionaries and so on.
1:37
If you create, let's say,
1:38
800 classes and only 50 of them get cleaned up well,
1:42
that could trigger a garbage collection.
1:43
Okay, so that's this first number.
1:45
That's the 700. The second number says we're going to trigger a Generation one collection
1:51
This is a little bit broader search of the memory space and can be more
1:56
expensive because we're looking at more objects potentially.
2:00
Now, this number 10 here means we're gonna do a Generation one collection for every
2:06
10 generation zeros. To write the generation zeros on,
2:09
we do that every 700 extra containers,
2:13
and then one in 10 of those we're gonna actually look a little broader,
2:17
and we're gonna get a generation one.
2:19
So that's what this 10 means. From generation zero to generation one,
2:22
the ratio of those is 10 to 1.
2:25
If we look at the final 10,
2:27
this is when a generation two collection is triggered,
2:31
and that's when the number of Generation one collections is greater than this number.
2:34
So for every 10 generation one collections,
2:38
there's a gen two collection. So that might sound a little bit complicated,
2:42
but let's break it down. For every time that we have this exceeding of some fixed
2:48
number of allocations of container objects,
2:49
700, that's going to generate a Gen 01, and then for every 10 generation
2:55
zeros Let's say, we're have 1 generation
2:57
one collection. And then every 10 generation one collections we're gonna have a generation two.
3:02
So it's a 1 to 10 to 100 from Gen zero to gen one to gen two
3:08
number of collections, and it just happens to be the thing that starts it all
3:12
off is the number of allocations of surviving container objects like classes and dictionaries and so
3:17
on. All right,
3:20
Hopefully, even though that's a little bit of a lot to keep in your mind
3:23
it gives you a sense of what's going on here.
3:27
Like, when is the this whole process going to get started?
3:30
It's pretty easy to see we've got 100 to a 10 to a 1 ratio for all
3:35
of these in the generations for the GC.
3:38
It's the allocation thing that kicks off the base of that whole process.
3:42
That's a little bit unclear. It's also worth pointing out that there's a
3:47
"gc.set_threshold()". So if you want to change these,
3:50
you can. Personally, I haven't really tried that,
3:53
but it seems like there might be some pretty interesting performance benefits you could get.
3:57
You know, I'm thinking of things like I go to a SQLalchemy model,
4:02
and I do a query against the database and that is going to return 1000 records,
4:07
that's gonna trigger a garbage collection.
4:11
But, you know, it's very unlikely those things are gonna have a cycle.
4:13
They were just created, right?
4:15
So, you could do things to say, maybe be in those situations where cycles are very
4:19
rare, you could kick up that base number.
4:21
It seems like something you might be able to play with.
4:23
We're gonna look at some examples of people doing way more insane stuff than that,
4:27
but, it seems like playing with that base number,
4:30
you could probably get some pretty interesting performance benefits or maybe drawbacks.
4:36
It depends, but you could definitely make a big impact by changing that first number,
4:39
that 700, to something bigger or smaller.