Python for .NET Developers Transcripts
Chapter: Memory management in Python
Lecture: Memory management in Python
Login or
purchase this course
to watch this video and the rest of the course contents.
Let's compare this story we just told about .NET's memory management with Python. Again, just like .NET there's no explicit memory management.
Now, we can do the C API extension thing for Python and down there you can allocate stuff and obviously that's just plain C
so, you know, there's a lot of memory management down on the C side but most folks don't actually write C extensions of Python. So, outside of that
there's not really any memory management that you do as a Python developer. It's all down in the runtime. In .NET, we saw that there were value types
and reference types. That's not a thing in Python. Everything is a reference type. Yes, we have maybe a string which is a pointer
that points off to some string in memory. We might have a customer object which points off to it. But also, the number 7
that is a pointer off to some object in memory. Okay, so everything is a reference type. Technically, some things like the numbers
the first few numbers do this, like flywheel pattern to preallocate it because they're so common. But in general, the right conception is
that everything is a reference type and that reference types are always allocated on the heap. There's nothing allocated on the stack
other than potentially, like pointers to your local stuff, right? But the actual memory that you think you're working with
that is allocated somewhere else. Memory management is simpler in Python. In .NET, we have generations and we have finding the live objects
and all of the roots of all the live objects and asynchronous collection of that when memory pressure is high, and all of that. Forget that for Python.
We have reference counting. Super, super simple. It's kind of like smart pointers in C++. I've got an object. There's three variables pointing at it.
One of the variables is gone because the the stack is now gone. Now I've got two. One of those is assigned a new value.
So now I've got one, and then the other one maybe it's set to null, or that function returns. When that count gets to zero
immediately that object is erased. It's cleaned up. So memory management is just as simple as every time a thing is added a reference to
that count goes up. If that count number hits zero, boom. It's erased. That's faster and more deterministic than garbage collection.
But there's a big problem. And what is the problem with reference counting? Cycles. If I have one object and it originally has a pointer
then it creates another but then they refer back to each other well, even if I stop referring to all those from-
either of those objects from all of the variables that I have they're still always going to have at least one reference.
Item 1, referring to 2, and 2 referring to 1. And even if those get lost, in terms of references they're never going to be deleted
from this reference counting gc. Most of the time reference counting works like a charm cleans up everything fast.
But there's a fall back generational garbage collector to work on these cycles and catch that extra memory that goes by. So, you can think of this
as much more like .NET's memory management for the cycles but the first line of defense to clean up memory is reference counting.
Again, that makes this deterministic. However, memory is not compacted after reference counting happens or this generational garbage collector
that catches the cycles runs. No, memory is never compacted in Python. That's interesting. It probably makes it a little bit faster most of the time.
You also get fragmented memory which can make caches less efficient. Though there are some techniques
around this reference counting memory management system that lessen the fragmentation called blocks, pools and arenas.
And the quick summary is each object of a given size- so I've got an object and it takes up I don't know, 24 bytes
there's a bunch of memory segments that are meant you know, there's one, let's say this one holds the things of size 24
and it'll allocate a certain amount and it'll keep allocating the things of that size into that chunk until it's full
and it'll create another one into these pools and then pools are grouped into arenas and so on.
You probably don't care too much, but the takeaway here is that there are mechanisms in place to help break down the fragmentation
that you might run into. If you want dive deeper into some of these ideas I recommend you read two articles both by the same person
"Memory Management in Python" by over here at this link here. And then "Garbage collection in Python: things you need to know"
So they talk about a lot of this stuff with pictures and graphs and actually the C data structures that hold this stuff
and so on. We're going to play around with this in some demos but if you really want to see, like what is the structure that defines these things?
Or what are the exact rules in which one gets allocated or cleaned up? Check out those articles. It's too low level of a detail for us
to really to dive into here.