|
|
6:52 |
|
show
|
1:04 |
Welcome, Welcome, to Python Memory Management and tips.
This is a pretty unique course.
We're going to dive deep into the internals of Python to understand how it works to make your code more efficient and make it faster.
I feel like Python memory management is a bit of a black art.
We use Python all the time, but most people don't really understand how Python memory works.
How are Python objects structured actually in memory in the runtime.
What does the reference counting part do?
What does the garbage collection part do?
we've heard maybe even you can avoid using the garbage collector altogether.
It's a bit of a black art.
Well, maybe not this kind of black art.
Maybe this kind of wizard.
Maybe this is the kind of black art, the programmer kind.
But if you really want to know how Python Memory works, this is a course for you.
We're gonna have a balance of high level conversations about algorithms, we're gonna dig into the CPython source code and we're going to write a ton of code to explore these ideas and see how they work in practice.
|
|
show
|
0:57 |
So why should you care about memory management in Python, anyway?
Isn't it just gonna work the way it's gonna work, and we just have to live with it?
Well, you'll see there's actually a ton that we can do to work with it or even change how it works.
Here's a simple little program.
It's running Python running one of our applications, and it was fine, but then a little bit later, it wasn't fine.
Now it's using 719 megabytes.
That seems bad.
Maybe we should do something different.
We're gonna learn a whole bunch of amazing techniques that would allow us to keep this memory usage much lower and we're going to dig in So you understand exactly the trade-offs you might be making and where these costs come from, how Python is doing its best to get the most out of the memory that we're using and so on.
Not only will you come away with an understanding, but we're going to dig in and get some design patterns and some tools and techniques that you can proactively use to make your program look more like the one on the left, and less like the one on the right.
|
|
show
|
0:46 |
Now, when you think about memory management and making our code more memory efficient or understanding how the garbage collector and reference counting works, you might think this is to make our code use less memory.
To make the memory required smaller.
But it also will have a side effect.
It will make our code faster.
Our code Python will have to do less garbage collection events, potentially is using less memory so less swap space, maybe better cache management, but also just some of the design patterns and some of the aspects of Python that are not frequently used but we're gonna bring into it will actually make our code quite a bit faster for some really interesting use cases.
this is sort of a general performance thing with a concentrated focus on memory.
|
|
show
|
2:57 |
Let's talk about what we're gonna cover in this course.
We're certainly going to talk about Python memory management.
But there's actually three distinct and useful things we're gonna cover here.
So, three separate chapters.
We're going to talk about how Python works with variables and look behind the scenes in the CPython source code to see what's really going on when we work with things like strings or dictionaries or whatever.
We'll get a much better understanding of the structures that Python itself has to work with which will be important for the rest of the discussion around memory management.
We're also going to talk about allocation.
People often think memory management, clean up.
You know, garbage collection, reference kind of and so on.
And yes, we'll talk about those things, but Python actually does really interesting things around allocation and a bunch of cool design patterns and data structures and techniques to make allocating memory much more efficient, which then lead into making reference counting more efficient as well as garbage collection.
So Python has two ways to clean up memory.
We're gonna talk about them both, when either of them come into action.
We'll also write a lot of code to explore that.
The data structures that you choose to represent your data and your application can dramatically vary in how much memory they use.
We're gonna take the same data and look at it through the lens of storing it in a bunch of different types of data structures: arrays, lists, dictionaries, classes, even pandas data frames and NumPy arrays and see what the various trade-offs are around these different data structures.
Once we get to talking about functions, you'll see that there are some really powerful and simple design patterns that can dramatically make our code faster and more memory efficient.
So we're gonna look at some really cool ways to make our functions use a little less or a lot less of the memory that we're using.
We're also gonna look at classes because storing data in classes is super important in Python and maybe gonna create a list of classes, a whole bunch of them and so on, and you can have a lot of data there.
So we're going to talk about different techniques we can use when we're designing classes in Python to make it much more efficient both in memory and it turns out a nice little consequence in speed as well.
Finally, we're going to do some detective work.
Once we understand all of these things we're going to take a script and we're just going to run it through some cool tools.
Try to understand from the outside what exactly is happening in our application.
How Python is using memory.
We're gonna create some interactive web views of this data.
We're actually gonna create some graphs, all kinds of stuff.
So we're gonna use some really neat tools to do the detective work, to understand how our program is working and where we should apply some of these techniques that we've learned previously that make it even better and faster.
That's what we're gonna cover.
It's gonna be a lot of fun.
It's gonna be hands on, but also some high level conversations.
I think you'll get a lot out of it, and I'm looking forward to sharing it with you.
|
|
show
|
0:42 |
What are the prerequisites to take this course as a student, what do you need to know?
Honestly, the requirements are really, really simple.
You just need to be familiar with the Python programming language.
We obviously don't start from scratch.
We're digging into the internals of CPython.
We're talking about the trade-offs of different data structures that you should already know from Python.
So, you should be familiar with dictionaries, lists, the ability to create classes, functions, and then the core stuff like list comprehension and loops and whatnot.
If those things are all in place for you, then you're absolutely ready to take this course.
If not, check out our Python for the absolute beginner course as a great foundation, and then you'll be ready to jump in here.
|
|
show
|
0:26 |
finally, let me introduce myself.
Hey there, I'm Michael.
You can find me over on Twitter, where I'm @mkennedy.
You may know me from the podcasts "talk Python to me" or the podcast "Python bytes".
I have done both of those.
And I'm also the founder and one of the principal authors here at Talk Python Training, and I'm super excited that you're in my class.
Thank you so much for taking it and I hope you really get a lot out of it, so, let's get started right now.
|
|
|
5:55 |
|
show
|
2:51 |
in this short chapter, we're going to talk about what you need to follow along with the course and to write code and play with the same tools that we're going to use.
Now, right off the bat, Would it surprise you to know that you're gonna need Python for a Python memory management course?
Of course you are.
But I do want to point out that you're gonna need at least Python 3.6 if you want to run the code that we're using and if you want exactly what we're using in this course, we're using 3.8, specifically, 3.8.5.
These memory behaviors can change slightly from version to version, but they should be pretty stable in Python 3 at this point.
But the syntax, things like F-strings and whatnot, we are using those, and those require Python 3.6 and beyond.
So the code will not run without 3.6 so make sure you have that right version.
Now, how do you know if you have the right version?
Well, you go to command prompt or your terminal and type Python3 -V in Mac os and on Linux, and hopefully you get something like 3.8.5 or higher.
The screenshot when I took it was a little bit older, but you know, you get the idea.
On Windows, you probably should type Python space, Dash capital V.
Now one of various things can happen here, especially on windows.
On windows, what you might get is, it might say, Python 3, or it might say Python 2.
You might also have Python 3 but your path might not have it first.
If nothing at all comes out or the Windows Store opens, the newer versions of Windows 10 will try to go to the Windows Store and install Python for you if you type it.
And that will happen usually if you just type Python or Python 3 by itself.
But with this -V, if it says nothing, that means you don't really have Python installed.
It's just this store shim that really should say "you don't have Python installed", even with an argument.
Nonetheless, that's what happens.
So make sure that you get Python 3.6 or above ideally 3.8 or above on your Windows machine or on Linux or Mac Os.
Finally, if you don't have Python installed, there's all these varying recommendations of how you install it and how that changes over time, What the trade offs are.
Personally, I installed Python 3 on my Mac using Homebrew and I installed Python three on Windows using Chocolatey.
So, those are both systems package management systems for the operating system that lets you automatically upgrade to the next version just by saying Brew - upgrade or Chocolatey upgrade Python Things like that, which is really, really cool.
But there's a bunch of different ways and trade off, so I recommend you check out this Real Python article: realPython.com/installing-Python/ and see what they say now, they say they're going to keep this up to date to try to keep up with the times or what the options are for installing Python.
So if you need to, have a look here.
|
|
show
|
2:05 |
If you want to follow along exactly what I'm doing, I'm going to be using PyCharm so you should use PyCharm too.
It's a fantastic tool for writing Python code and exploring it.
However, if you don't want to, you can use other editors as well.
That's fine.
So, if you're going to use PyCharm, I recommend you just get the latest version and use that.
They offer two versions.
They offer the PyCharm Community Edition and The PyCharm Pro Edition.
For what we're doing this course the community addition, the free open source edition is totally fine.
There's one step where I say "let's see the CPU profile graph in PyCharm" and we spend about one minute on that.
That requires the Pro Edition, which you can just look at the picture, you don't have to be able to run it yourself.
Other than that, you're gonna be able to do everything with the free, open source Pycharm Community Edition.
So, there's no real reason to not use it other than you just want to use another editor.
I think this is a great one and you'll see me using it throughout the course.
The best way to get PyCharm is to go to the JetBrains toolbox and get that installed and then use that to install either PyCharm Pro or PyCharm Community or both, and this will keep it up to date and let you know when there's new versions and let you roll back to different versions.
It's a much nicer way to work with PyCharm and the JetBrains applications rather than just installing it directly.
If you don't want to use PyCharm, which is fine, I recommend that you use Visual Studio Code.
That's the other really good editor these days.
Those two have, like 85%, 90% of the market share at this point.
From what I can tell the recent surveys and so on, this is a great one.
Just be sure that you install the Python extension or it's not going to do much for you.
And you get that by clicking on this little button here and then the Python extension is the most popular one.
Looks like it's been downloaded 55 million times when I took this screenshot, and I know it's much higher than that right now.
So, two editor options, PyCharm, Visual Studio Code, PyCharm community free version will totally work for this course.
|
|
show
|
0:59 |
You want to get the materials, the source code and whatnot data files for this course.
You could pretty much type them in as you go, but if you want to play with it and tweak it, there's a lot of little ideas where you just make a small change and you see what happens in terms of the memory usage or the performance or whatever, so I definitely recommend that you get the source material, so go over here to github.com/talkPython/Python-memory-management-course.
If you're a friend of Git and you like to work with it, just go clone this repo.
If you don't really know about Git and your uncomfortable doing it, you don't have to clone it.
Officially, you can just click the green button where it says "code" and download It as a zip file and roll with that, right?
There's no major changes or whatever we expect other than bug fixes along the way.
Make sure you get the source code and take it with you.
In fact, I recommend you fork and star it, just so you have it for sure.
Alright, that's it.
Go through these steps.
You should be all set up and ready to take this course.
I'm excited to dive into Python memory management with you.
|
|
|
40:00 |
|
show
|
1:40 |
it's time to get into Python memory management properly and start writing some code and looking at how things work behind the scenes.
We're going to start by focusing on variables.
Now, you might think the place to start would be garbage collection or creating new memory or whatever, but to understand how all those things work, we're gonna start by talking about variables and how Python manages them, passes them around and so on.
We're gonna start in a place that might surprise you for a Python course.
We're gonna talk about C.
Now, just to be clear, it's only a little bit of C.
If you don't know the C programming language, don't worry, it's a few lines of code, I'll talk you through it.
The reason we're gonna talk about C is when you talk about Python, people say "Oh, you just type Python and you run that".
What they almost always mean is CPython, right?
Python is actually the programming language implemented in C.
And you can go, we'll talk about some places where you can see the source code, how things work in the actual implementation of the Python runtime.
Some people call the thing that executes Python, a "Python interpreter".
That's true for CPython.
There are many other things that run Python that do so that is not interpreted.
We've got PyPy, which is a JIT compiler, not an interpreter, you've got Cython and so on, so I prefer runtime.
But if you want to say interpreter, same basic thing here.
So what we're gonna do is we're going to talk a little bit about C and how specifically pointers and things like that in C surface when we actually get to the Python language, because even though there's other ways to run Python, the vast majority of people do so on CPython.
|
|
show
|
2:57 |
Now let's look at an example of some very, very simple code in C, C++ and first C++ it's a little bit complicated, cause even simple code can get pretty complicated.
But here it is.
We're gonna look at this function so this is not object oriented programming like C++ more on the C side, and what we're gonna do is we're gonna have two person structures or they could be classes whatever.
But we're gonna have two pointers notice the person star, not person, but persons star.
That means we're passing the address of where that thing is in memory rather than making a copy of its data and passing it along.
We're gonna take these two person objects that we're referring to by address and then we're gonna ask the question "are they the same?" The way we do this in C is you go to the pointer, which is some place in memory.
In this particular example, it was something like this, like 0x7ff whatever it doesn't matter.
This the value of this thing is just a place out in memory.
And here in that memory location on what's called the heap, we've got two pieces of information, right?
Two of these pointers point to those places out in memory.
That's why we're gonna use this arrow operator P1 goes to our dash greater than arrow, the id.
So follow the pointer out and then find out there that memory location what the id is and see if they're the same.
So this is how it works in C, and it raises a lot of questions that are maybe not even knowable here.
For example, who owns these objects?
Who is responsible for creating them and ensuring that they get cleaned up correctly?
That answer could vary.
It could be allocated in the stack of the thing that called this function.
It could be some other part of the program created it, and when it shuts down or that part cleans itself up, it's supposed to delete those out of memory.
How long will they stick around?
I don't know.
Until the other part, whoever owns it, decides that they're going to go away.
What happens if we forget?
What happens if somehow the programmer lost track of that piece of memory?
Well, that's a memory leak, and it's just going to stay out there forever.
There's nothing that's gonna fix it.
That's bad.
But what's worse, usually is if actually it gets cleaned up too soon.
And the reason is you'll still have that p1 pointing to that memory address, But what is living out there is either gone or it's Some other piece or some partial piece of some other data, and that could be all sorts of problems if it gets cleaned up too soon.
So what I want you to take away from this is just there's a lot of actual bookkeeping or accounting around things that were created, how long they live, whose job it is to clean them up, when they can do it, and they've got to make sure that they don't forget to do it, but not too soon.
Okay?
So all of these things are really tricky around C and pointers, and we'll see in Python that generally that's not what we have to do.
A lot of things are done for us, but we want to understand what is done for us.
How is it done for us, and so on.
|
|
show
|
1:33 |
Let's look at one more example before we move on to talk about how Python does the equivalent thing.
So let's look at a slight variation.
We've got the same SamePerson function.
It's going to take two people and determine if they're the same.
However, this time, instead of passing pointers, which are basically numbers to memory addresses where the real data lives, we're just gonna pass the id's.
Remember before we're saying p1 arrow, id.
Might as well just past the id that makes it a little easier to do.
So we have our two pieces of data which are integers p1_id and p2_id.
They don't point anywhere.
They literally just have the value, right?
This is the same thing that was the id before.
So in C, we can pass the value of a thing or we can pass a pointer like we saw before and there's good reason for both.
If you have a large data structure and you want to move it around without making copies because that would be slow, You would pass by reference or pass a pointer.
If you wanted to, even more importantly, you want to make changes to it and have those changes reflected in other parts of the program, you need to pass the pointer, change the shared location, and then everyone will see those changes.
As opposed to here if we change the id only this function would see it.
But, C and many languages, c#, other languages, they have this distinction between passing sometimes just the value, like the integers, and sometimes like previously the address of the thing that you gotta follow as a pointer out there.
So, I want to put these two up and contrast them for you, and then we're gonna dive into Python.
|
|
show
|
3:33 |
Alright.
Time for some Python code.
In a big question, does Python have pointers?
Well, let's look at a function.
The same, SamePerson function but written in Python.
So here we have a function: def same_person, and we're passing a p1 and a p2, and we're using type annotations to indicate this is a person class.
So, p1 is a person, p2 is a person indicating also the function returns a bool, true or false, on whether they're actually the same person.
But notice p1.id == p2.id That's not that arrow thing, right?
We don't have to treat it differently.
And if you think about Python, you've probably never seen the star in the context of meaning this is a reference or a pointer to a thing you've never allocated memory, you probably never cleaned it up.
There's a del key word, but that means something totally different that doesn't really directly affect memory.
Are there pointers here?
I don't see any pointers.
Here's the thing.
Let's try to print out p1 and p2 and see what we get.
Well, the interpreter would just say "__main__" we have a person class object.
So person is the class and we've created one of them.
So object at this memory location.
Hmm, at this memory location sounds a little bit like, well what we have the pointers, doesn't it?
You could also use this function, the cool function in Python a built-in called "id" and say, where does this thing, basically, where does it live?
Hey, and if those numbers are the same, they're sharing the location.
If they're not, it's a not shared thing.
And we can talk about that.
We will talk about that as we go.
But if you go out here and actually look in memory, we're gonna have two things out on the heap dynamically allocated, And these are going to be pointing to it.
Well, p1 and p2 are pointing to it.
Those id of them actually correspond to the address.
This is the same situation as we had in C++ pointers.
The language is hiding it from us.
We don't have to worry about it, right?
That's cool.
But as you think about, you know, what is the lifetime of p1 and p2?
Who was in charge of it?
All those same questions I asked, Come up.
Who owns these objects?
Well, in Python, the answer is better because you don't have to worry about it.
Like I said, you probably have never really thought about cleaning up memory by, like, going free or delete or whatever on some thing you've created.
Because you can't really do that.
But somebody has to, right?
If these get created.
So the question Who owns it it's kind of, it's interesting, It's kind of the community of all the things in the program, all the things that share that piece of data.
Once they all stop paying attention to it, it goes away.
It goes away for one and other reason.
There's a couple ways in which it could go away and we'll talk about it.
But the runtime itself kind of owns of these objects.
You don't have to worry about that.
How long were they stick around?
Until everyone is done with them, maybe a little bit longer, depending on how they're linked together.
But generally speaking, just until everyone is done with it and the runtime also knows who's paying attention to it, so you don't have to worry about the time.
It is really nice.
So in a sense we have pointers in Python, yes, but we don't have the syntax of pointers, lovely.
Nor do we have all the stuff for in the memory management and the accounting of who owns what, when, and when it should be cleaned up.
All those things are gone, which is beautiful, but we also need to understand how and when Python does those things on our behalf right?
Does Python have pointers?
I'm going to say yes, Python has pointers, but you don't syntactically have to worry about it.
|
|
show
|
1:19 |
Let's look at that alternative example that we saw in C++, But now in Python.
Remember what we did is we passed over the id We said these are integers and we're just gonna compare them directly.
And here's the same code in Python, really The only difference is where the return value goes and whether it goes int p1_id or p1_id int.
Right, this is the same basic code.
And if you print out p1_id or p2_id, it's gonna look just like the same thing that we had in C++, but it's not the same thing.
In Python, these are different than what we had in C++.
In C++ they're just allocated as the program runs technically on the stack.
They're not passed by value, all sorts of stuff like that.
So we're going to see that, actually, these also live out in memory, they're dynamically allocated, they have to be cleaned up just like our person object was.
So these are pointing out here in just the same way.
In some sense, does Python a Pointers?
Python has more pointers than almost every other language, right?
Even things like numbers and integers or functions themselves or modules and source code all of those things are objects and pointers.
Even when they operate as basic numbers, like we have here these are basic numbers that are allocated dynamically out in memory.
|
|
show
|
4:05 |
I'm going to reveal the hidden truth about Python variables to you now.
So, if you've seen the movie The Matrix, one of the best science fiction movies of all time, Morpheus spoke to Neo when he was just first realizing, or was being told, that he was living in a simulation and Morpheus told him a little bit about that and said, "look, you have two choices here, a choice between two things.
One.
You can take the blue pill.
Forget this happened and just go back to being blissfully unaware that you're actually in this weird, dystopian world.
Or you can take the red pill and see the truth.
You can see inside the simulation what's actually happening".
And I feel a lot of this course is a little bit like that.
When we work with Python, it generally just works, everything's smooth.
Sometimes it uses more memory than we want.
Well, that's the way it is.
There's not a whole lot we can do about it, right?
What I'm gonna show you in this course, is there are a lot of little techniques that combine together that will either help you build better data structures, better algorithms and how you use those data structures, or at least just understand why your program is using a lot of memory or it's a little bit slow or whatever.
And in order to do that, we have to look beyond the Python syntax down into the CPython runtime.
So in that regard, this course is much like taking the red pill.
By the way, if you wanna watch this little segment, the link at the bottom is like a five minute video.
It's great.
So let's take the red pill.
Here's some Python code.
Age equals 42, so age in an integer.
Yes, its value is 42.
You can add it.
You can divide it.
You can treat it like, you know, numbers in a programming language.
It couldn't be simpler, right?
Well, not exactly.
So if we look at what is actually happening inside of the CPython runtime, wouldn't we work with numbers like these integer types, They're called integer types or int in Python, but they're actually PyLong objects in the C level.
You'll see there's a whole lot of stuff going on here.
Now, this is just a very small part of a single function in a very large piece of code that you can actually click "Go here" and go to bit.ly/cPythonlong.
This will take you to this line in the CPython source code, which I don't remember exactly how long it is, but it's hundreds of lines long.
So the ability to create one of these things is not super simple.
Now, for the number 42 it gets treated special.
Small numbers get treated special, as we'll see later.
If this was 1042 it would be closer to what's actually happening.
The important thing, Is not which one of these functions around Python integers runs, but, just take this one as an example.
So, PyLong_FromLong, which is a C++ or C long that's converted to this PyLong, what does it return?
It returns a PyObject pointer.
Everything in Python is a PyObject pointer.
Strings, numbers, functions, source code, classes you create, everything.
This is a common base class for everything in Python, okay?
But specifically what it's creating is a PyLong object that's then being sort of down casted to this lower version.
So you can look through.
There's a bunch of stuff even within this function I removed just so it would fit on the screen.
But it's allocating here.
See this line that says _PyLong_New(1).
And then it does a bunch of work to it, is this pointer that's allocated?
That's the dynamic memory allocation out on the heap.
It does a bunch of stuff to set its value, and then it returns it as a pointer, which the Python runtime just converts that to something that feels like a nice little clean integer like that one line we have above.
But there's actually a ton going on.
So this is the red pill world that we're going to explore what's happening behind the scenes, the algorithms that are running, the reasons that they're happening throughout this course.
|
|
show
|
1:57 |
Alright, we've taken the red pill, let's go explore.
Over here, I'd like to enter that link bit.ly/cPythonlong and it'll redirect us over to this horrible url, which you can see now why I was using Bitly to shorten it.
This is, in the Python organization, the CPython project in the longobject.c file, and this is the PyObject PyLong_FromLong, and you can just go through and you can see all the stuff that's happening that's going on here.
So there's all the stuff I deleted we're still in the same function and still in the same function.
When are we done?
Goes for aways.
There's quite a bit going on here.
Let's go to the top.
You can see that this is the long, arbitrary precision integer object implementation.
And in this objects section, you'll see all different sorts of things, like Booleans, and then you've got dictionaries, here's part of the dictionary implementation.
There's the allocation stuff.
obmalloc.
Here's the base PyObject.
yup, PyObject pointers all over the place, Right?
This thing.
So this is the implementation for that and one thing that's interesting is, you would think just the base class would be pretty simple, but it's 2000 lines long, like the header file.
The header file, if you go up, is already, I think, like 600 lines long.
So there's a lot of stuff going on with every one of these objects, and throughout various places in the course, I'll be giving you the short your url's, which will take you to different locations in the source code.
You don't have to go look at it, especially if all the code here makes your eyes water.
That's fine.
You don't have to look at it.
But this is where you can go and see exactly what is happening and when or what kind of data is being stored and what structure and how the runtime is using that to manage the memory or allocate things and so on.
|
|
show
|
1:29 |
Let's real quickly look at the documentation from Python's docs around this id built-in function, and it says "id is a function that takes a single object".
And if you look at the description, it says "it will return the identity of the object.
This is gonna be a number which is guaranteed to be unique and constant during the objects Lifetime, however, two objects may with non-overlapping lifetimes may have the same id value".
That's more likely than just random places in memory.
As you'll see, there's patterns that have a tendency to reuse memory rather than allocate new memory.
Nonetheless, this while the objects are around, what this number come back means this is the unique identity of the object.
And if you get two objects and you ask from them and they're the same, they're literally the same object in the runtime.
If those numbers are different, even if they would test for equality like two strings might be equal equal to each other but they're not the same location of memory, potentially, then the id would come back to be different.
And, little special detail here at the end, if you happen to be in CPython, The CPython implementation detail is that this is the memory address, the address of the object in memory.
So this literally is the base 10 version of "Where's This Thing in Memory?" Id is simple, but we're gonna be using it a couple of times throughout this course to ask questions like "are these actually the same thing?" or "where do they live in memory?" and stuff like that.
So, here's the deal.
|
|
show
|
3:15 |
Now finally, let's write some code, as we're going to do for much of the rest of this course and explore these ideas.
But before we actually start writing, I want to just show you how you can work with the code from GitHub and just get it started if that's somewhat new to you.
So over here, we've got our code structure.
Now, right now, these are all empty.
I just have little placeholders, so Git will create the folders.
Git only tracks files, not folders.
So this is the trick to make it create them for me.
What I want to do is just open up this entire project here in PyCharm.
Now, the first thing I want to do in order to do that is create a virtual environment.
We're not actually going to need the virtual environment for quite a while, but let's go ahead and just start out that way.
So I'm gonna open a terminal here in this folder.
I checked it out as a mem-course not its full name, but I'm gonna create a virtual environment.
So, Python3 -m venv venv, and we want to activate it.
On Windows, this would be the same.
Sometimes you drop the three.
This would be scripts, and you wouldn't do the dot.
It always turns out that whenever you're creating a virtual environment almost always pip is out of date.
That's super annoying, though we just want to make sure that we're going to upgrade pip.
Alright, now we're ready.
I'm gonna take this directory mem-course and drop it into PyCharm.
And because of the way I put it over here, it's kind of annoying.
You can't get back to its super easily.
So if you right click up there, you can get a hold of the course.
I'm gonna drop this on PyCharm.
Now, on Windows and Linux, you can't do that trick.
Just go file, open directory, and select it.
Now, for some reason, it found the wrong virtual Environment, so I click here, add interpreter, pick an existing one, go to my project directory.
Sometimes it gets it right.
Sometimes it doesn't get it right.
Perfect.
So you can see it's running down here, and it looks like it's probably the right thing.
So let's go down here and we're gonna just add a quick little function that we can, file that we can run just to make sure everything is working.
I'm going to call all the things that are here that are meant to be executed directly "app_something" and then there's gonna be a bunch of libraries that they use but maybe don't get executed directly, and those will just be whatever their names are.
So that's the convention.
I'm gonna try to make it clear what you can run in here and what you can't.
So I'm gonna call this "app_size".
And then as a good standard practice, we want to create a main method we want to use the "__name__='__main__'" convention.
I've added an extra personalized little live template to PyCharm so I don't have to type it.
So you'll see me type "fmain", and if I hit Tab, it writes this for us.
You could write that yourself, but it's super annoying to write it all the time So I'm gonna write this and we'll just print out.
Right click, Run.
Looks like we're using the right Python to run our little project.
Cool.
So now we've got our projects open.
Once we get stuff in here, I'm gonna be deleting these, but that's how you open the code.
Of course, you'll have the code already here, but then you could just go right click and run the various things that you want to explore.
|
|
show
|
7:47 |
Alright, let's answer some questions and you can see I've already threw in the questions up here just you don't have to watch me type like print.
How big is this?
How big is that?
And the thing I want to explore is how much memory is used by certain things.
So we get a sense for a number versus a string that has one character versus a string with 20 characters or a string with one character versus 20.
Like how much is it?
The character space versus all the underlying runtime infrastructure.
That's gonna contribute to the memory use.
So we're in luck.
Python has a good way to answer this.
We're gonna import sys.
And over here we can print out things like sys.getsizeof a thing like the number four.
So let's do that real quick and then run.
How big is the number four?
Well, in a language like C++ or C# or something like that where these are just allocated locally, you always have to talk about the size and the number is like a short or a long or something like that.
But typically this would be 2, 4, or 8 bytes long.
in Python, a number, a small number, is 28.
If we had a little bit bigger number, it's still 28.
Let me make a little bigger so it stays on the screen But if it's a lot bigger, we use a tiny bit more memory.
So the size matters, but not so much.
But there is some overhead.
Remember, this is the PyObject pointer and all the things to know how many people are keeping track of it, where it was allocated, what type it is.
All of these things are happening behind the scenes, and we just see the simple number 4, but Python is doing a bunch of work through that infrastructure that we talked about.
Remember the red pill stuff?
That's what's happening, that's why this a little bit bigger.
Alright, what about this one?
Let's print sys.getsizeof the letter "a".
Well, these you feel like these might be similar, right?
I mean, in most programming languages, that's 1, 2 or 4 bytes and this is 2, 4, 8 So maybe it's even smaller.
Let's see.
Nope.
50.
It's bigger.
So it turns out, strings have a lot going on, so there's a little bit going on there.
And let's see if we have a big string like this.
How much larger is it?
25.
Well, 25 larger right?
75 now, and that's because we have 26 of these rather than 1.
So you just multiply that by 26.
We get our 26 with multiplied by 100.
Look, it's about 100 bigger.
Okay, that's what's going on here, right?
So basically, there's this infrastructure to keep track of the string and do all the string things and then the extra data.
And if we had Unicode characters, they might take up more than one byte per character.
How about a list?
Simple little empty list.
How big is that?
56.
Okay, that's not too super large.
Now let's do something with like, 10 items in it.
There we go.
I put 10 in there.
Let's see how much bigger it is.
136.
Well, that's quite a bit bigger.
Let's think about that for a second.
Well, what does the list actually contain?
The list doesn't contain the values the list contains basically what every variable in Python is It contains a pointer to a number out in memory.
So there's somewhere out in memory a 1.
And in here in the list, there's the go find the 1 the number over there and then here's another pointer out to the 2 wherever it is in memory.
These smaller numbers are interesting where that actually is.
But we've got a list, and it's got 10 of those in there.
The lists generally don't allocate one slot at a time.
They kind of grow in a doubling type of way, like, you've got 10, and then you add a few more so we're gonna allocate 20 and then 40 and and so on that kind of pattern so that you're not constantly allocating every time you're adding something and copying cause that's super slow.
All right, well, that's how big that is.
Sort of.
We're going to see that this going to get interesting and let's actually do something else.
So how much memory did this take like, How much does that line contribute?
Well, it contributes, we saw that each number is 28 and then it's gonna allocate whatever the list needs to be.
The list by itself is 56.
So each one of them would have 280 bites, probably for all of those numbers, those 10 numbers because they're 28 each and then we're gonna have the 56 for the list.
That's like 320 or something 330.
And then there's also the pointers that are gonna be in the list as well that have to point out there, maybe the over allocation.
So something's going on like this is not big enough, right?
Just in the numbers alone, they should take 280 bytes.
We're going to see what's going on in a minute but this is how much room this thing is taking.
But let's try to force the issue by saying "What if there's a large piece of data right there and a large piece of data right here?" So I'm gonna do a little bit of work here.
I'm going to come up with some data that we're gonna ask about, and I'm going to go from 1 to 11.
I'm going to add in some stuff 10 times.
The number of elements in the list should be the same.
I'm going to come up with an item and the item is going to be a list that starts out with the number of whatever it is the loop.
So, first time through this will be 1 second time It'll be 2, and so on, and Python has this funky little trick that we can do here.
So if n is 7 and we can come over here, there's a list of Let's put, like 3 in there and we times n what we get is a like a multiple a list with that copied that many times.
So here we get a single list with seven 3's instead of just one 3.
We're going to do that here.
i times i so first it will be 1, then 4, so on, times 10.
And by the time that gets to be 10 that's gonna be 1000.
So it'll be a list with 10 in it 1000 times.
the last one that's out here is gonna be bigger than just, you know, the number 10.
Absolutely.
You're gonna put that item in there and let's just really quickly print out data just so you see what we got here.
Notice there's a whole bunch of tens and a bunch of nines fewer and so on, right?
It kind of grows geometrically.
All right, so that looks like that did kind of like what I said it did.
And let's just print sys.getsizeof, Data.
184.
What do you think?
I'm gonna think no.
no, that's not right.
But, this does give us a sense of what the base size is.
So what are we answering or what information are we getting when we say getsizeof?
What we're getting is it goes through and it says, "I'm gonna look at the actual data structure, the list".
So, this one right here and let's see how much it's internally allocated, what are its fields?
And if it's got a big buffer to store items it puts in it.
How long is that buffer?
But what it doesn't do is it doesn't traverse the object graph.
It doesn't go "Okay, well, there's 10 things in here.
Let me follow the reference from each one of those 10.
See how big it is.
And if it has references, follow their references" it doesn't do this traversal which is actually what you need to know about how much memory is used.
But this getsizeof, its A start, you'll see that there's a better way that we can get going to actually answer this question more accurately.
|
|
show
|
3:46 |
To answer the question of how much memory do these things actually use in the way that I describe you traversing the object graph, We're going to use a cool foundational library: "psutil" and I said we didn't need the virtual environment right away.
Well, I forgot about this one, so we need it right away.
We're going to pip install that one.
You can see PyCharm if I just click This will do that for me.
I added this requirements file, now I put it in there.
Alright, that's all happy, but what we're gonna do is we're going to write, or take this code over here that basically will do that traversal, so give me the size, and it'll just recursively go looking and say, "is that a dictionary?
Does it have a __dict__?
As in it's a class.
Is it a thing that can be iterated"?
and so on.
And it's just going to recursively dive into those things.
Okay?
So we can use this to traverse that object graph over here, if we just import it.
Looks like it will import right?
But no, it doesn't.
Even though you can be assured that is what I called it "size_util".
And the reason is PyCharm is looking for something up in this folder called "size_util".
So what I am going to do is I'm going over here and tell PyCharm "also look in this directory as a like a top level directory for imports" and the way you do that, is you say mark directory as sources route, and we're gonna need that for the others, potentially as well.
Here we go.
Oh look, now it works.
So, what we can do is we can do the this and say "get full size of that" and let's do it for the "a" and let's do it for "26 a's".
Now, those things don't have stuff they contain, so these numbers are likely to be the same.
And they are "28 28 50 50 75 75", but where it gets interesting is where you have container objects.
So if we just put the list in because the List is empty it's probably the same size.
How big is the list?
It's still 56.
Alright, so those are the same size, but here's where it's going to get interesting, where we look at the size of something that points to or contains other things that are not empty, basically.
Look at this one: the 10 items instead of being the 136, if you take into account all the stuff out there, so the other 280 bytes that you're gonna get from these, then that's how big it is, okay?
And then here's the one where it was a really out of whack.
Where It was really crazy in terms of, we know we allocated tons of memory and yet it said 184.
Not likely.
So let's go get the full size of this data list that we've created.
There we go.
It probably would be better, even if we go with this way little comma digit grouping there 31 basically 32 kilobytes for this beast that I created here this geometric growing list of lists.
So if we're going to understand the amount of memory of these objects take, we gotta look at all the container objects, the things that they contain and recursively do that, right?
If there's container objects that are contained, look at their things that are contained and so on.
And that's what this little utility does, and we're gonna be using this throughout much of the course.
Do not use this sys socket "getsizeof" which is great for the relative size of the like, the core essence of the thing, but not the object graph that it relates to, and that's what this size utility is all about.
|
|
show
|
1:07 |
Let's talk about a cool little optimization that Python uses for some of the objects that are common and reused.
Now you've heard me talk about the numbers, and seeing small numbers behave differently than others.
And that's because Python applies what's called the flyweight design pattern to them.
So let me just read what Wikipedia had to say about it: "in computer programming, flyweight is a software design pattern.
A flyweight is an object that minimizes memory usage by sharing as much data as possible with other similar objects".
So, for example, numbers that have the same value could literally be the same place in memory.
Those are immutable.
It is a way to use objects in large numbers when a simple, repeated representation would use an unacceptable amount of memory.
So, like if you have the number 3 appear 1000 times like that's gonna be say, that would be 28k when really it could just be 28.
So often some parts of the object state can be shared.
It's a common practice to hold them in external data structures and pass them to objects temporarily when they're used.
Okay, so that's the flyweight design pattern, and you'll see that Python uses that some of the time, especially around numbers.
|
|
show
|
4:47 |
Alright, let's do one more demo.
Let's explore this idea of flyweight numbers.
So, I'm gonna create a new little thing we can run here, app, remember that means you should run it directly, all right.
And we'll do our fmain to just boom pop us into the right pattern.
And I copied some text here that's just gonna help us a little bit.
So this is gonna be an example showing number, which numbers are pre computed and reused.
That is the flyweight pattern.
And what I want to do is come up with a range of numbers from -1000 to 1000, and then we're just going to say for this 872 and that 872 are they the same?
So a really simple way to do that is to have just two lists, each with -1000 to 1000 numbers in them, like -1000, -999 etcetera.
It will say this is gonna be a list of range from -1000 to 1001.
Annoying, but it doesn't include the end right in this range thing, and we'll do that for 2.
We're going to do it twice.
What we want to see is by doing it twice, even though the exact same values will be in there, will they be the same memory address?
Will they literally be the same PyObject pointer thing in memory?
Or will they just be the equivalent values?
All right, and a way to do that, we'll just keep a track of reused, make a little set or something like that.
Actually, let's make it a list so we can sort it.
for number 1 number 2 in we're gonna use this cool thing called "zip", and if you have two lists, what it's going to do, if they will get out the way, is if you give it two lists, it will take an item from one and the other and put them together as a tuple.
So if I say list one list two, I'll get -1000, -1000, -99, -99 each time through.
But they're going to be the values out of the two lists, and we're gonna store them into n1 and n2.
And just so you know what we're doing here we'll print n1 and n2 and that's not what I want to run, let's run it here.
Here you go.
You can see, It's just like lining them up side by side, but this one comes from list 1, and this one comes from list 2.
So what we want to do is we want to ask, Are they equal?
Of course they're gonna be equal.
More importantly, are they the same place in memory?
So we'll say if the id of n1 is equal to the id of n2 then we're gonna reuse, go to our reused and append doesn't matter which one just one of them to say "this number is reused".
Then we're gonna print and print out "reused", something like this: "Found Reused, n1".
All right, let's go and just print that and run that real quick and see what happens.
Ok, this is pretty interesting.
Look, it goes up to 256 and it starts at -5.
It's weird, right?
So the numbers -1000 up to -6, Those are not reused.
Those were different allocations treated as totally different, unrelated things that just happen to have the same value.
But from -5, to, scrolled the wrong thing, to 256, these are literally the same thing in memory.
So there will only be one 244 PyLong object pointer in the runtime because it's always using it over and over this flyweight pattern.
So we can also just be a little more clear to make sure we know what's happening.
So we have a lowest equals the min of reused, and the highest is the max, and we just print out flyweight pattern from, put a little "f" in the front.
There we go, lowest to highest.
Alright, one more time.
Just like we saw flyweight Pattern is from the numbers -5 up to 256.
This is actually a super cool pattern that you can use in your own code.
You saw, like, Python objects and data structures are fairly expensive, memory wise.
So if you have the idea of like we've got immutable data that's reused in lots of places, you could come up with your own concept of a flyweight pattern here and optimize that.
But, you know, that's not really the point here.
The point is more to talk about the internals of what Python is doing around the numbers to prevent them from going totally crazy in terms of how much memory they consume because -5 to 256 those numbers appear all the time.
|
|
show
|
0:45 |
If you're a fan of this red pill type of thinking, you should definitely check out Anthony Shaw's CPython internals.
I've had him on the podcast a couple times to talk about it, and he's finally written it up as a book.
So this, like 350 pages talking about all the CPython source code, how you can figure out, you know navigate where in the whole structure, different things are, like where the object headers vs the objects themselves and digging into a whole bunch of interesting things the parsers and whatnot.
So if you want to dive into not just the memory side of CPython, which is what we're gonna still spend a lot of time on, if you want to look at the whole thing, the parsers, the execution and everything, you know, check out Anthony's book.
It's definitely good, recommend it.
|
|
|
19:53 |
|
show
|
1:02 |
When you think about Python memory management, you're very likely thinking about reference counting, garbage collection, how things get cleaned up.
But memory doesn't start there.
It starts by getting allocated, and it turns out that Python has a whole bunch of techniques and patterns and ideas around making allocation fast, efficient, doing things so there's not memory fragmentation, all of those kinds of things.
Allocation in Python is actually way more interesting than you would think.
So we're going to spend some time talking about the core algorithm, using pools and blocks and arenas, which maybe you've never heard of, but all of these things are really important in working with objects to prevent memory fragmentation and prevent them from blowing up in terms of taking too much memory and so on.
So this will be really a fun and important prerequisite before we get to what most people think when they think of Python memory, and that's the cleaning up side of all that.
|
|
show
|
4:00 |
Let's do a little thought experiment.
Imagine we have this one line of Python code, which we know whole tons of stuff is happening down in the runtime.
But on the Python side, it's simple.
We have a person object.
We want to create them, past some initial data over to them, their name is Michael, and so on.
Now let's imagine that to accomplish this operation we need 78 bytes of RAM from the operating system.
What happens?
How does that get into Python?
Like, what part of memory do we get?
How is it organized and so on?
So a very simplistic and naive way to think about this would be all right, what we're gonna do is going to go down to the C-layer, and C-Layer is going to use the underlying operating system mechanism for getting itself some memory.
It'll malloc it, right?
So malloc is the C allocation side and then free, we would call free on the pointer to make that memory go away, Okay?
So you might just think that the C runtime just goes to the operating system and says "give me 78 bytes", and the operating system says "super, we're gonna provide 78 bytes of virtual memory that we've allocated to that process", which then, boom into that we put the thing that we need, some object that has an id and the id wasn't explicitly set.
But, let's say it's generated.
The id is that in the name is Michael.
Well, that seems straightforward enough.
I mean, you have these layers, right?
Python is running and then Python is running actually implemented in C and C is running on top of the operating system and the operating system is running on real RAM on top of hardware.
So this seems like a reasonable thought process.
But no, no, no.
This is not what happens.
There's a whole lot more going on.
In fact, that's what this whole chapter is about, is talking about what happens along these steps.
And it's not what we've described here.
Let's try again.
So at the base we, of course, still have RAM.
We have hardware.
That's where memory lives.
We still have an operating system.
Operating systems provide virtual memory to the various processes that are running so that one process can't reach in and grab the memory of another.
For example, we saw that there's ways in the operating system to allocate stuff.
So at C, there's an API called malloc that's gonna talk to the underlying operating system.
This is what we had sort of envisioned the world to be before.
But there's additional layers of really advanced stuff happening on top here.
Above this, we have what's called Pythons allocator or the PyMem API and PyMalloc.
So the C runtime doesn't just call malloc, it calls PyMalloc, which runs through a whole bunch of different strategies to do things more efficiently.
We saw that in Python and CPython in particular, that every tiny little piece of data that you work with, everything, numbers, characters, strings, all the way up to lists and dictionaries and whatnot, these are all objects, and every one of them requires a special separate allocation, often very small bits of data, and that's why Python has this Pymalloc.
But wait, there's more.
If you're allocating something small, and by small, I mean sys.getsizeof, not my fancy reversal thing.
So if you're allocating something that is in its own essence small, then Python is going to use something called the "Small Object Allocator", which goes through a whole bunch of patterns and techniques to optimize this further, and we're going to dig into that a bunch.
So if you want to see all this happening, you can go to "bit.ly/pyobjectallocators", the link at the bottom, and actually the source code is ridiculously well documented.
There's like paragraphs of stuff talking about all these things in here, but there's actually, in there, There's a picture, ASCII art-like picture that looks very much like this diagram that I drew for you with some details that I left out, but they're in the source code.
|
|
show
|
1:52 |
As you saw, in just the last video, Python has this Small Object allocator.
Well, if small objects are treated differently than other objects, you want to know what objects are small and more of them are small than you would think, because most big objects are actually just lots of small ones.
So let's take a look at one example.
There's many we could talk about.
So here we've got data, a list that contains 50 integers, 1 to 50, and in a lot of languages, the way this would work is you would have just one giant block of memory that would be, say, 8 times 50 long for each 8 byte number that's gonna go in there 4 times that right 200 bytes if they're regular integers and so on.
And it's just one big thing.
But that's not how it works in Python.
When you put stuff together like this, you're gonna end up with the list that has, you know, a bunch of pointers as many pointers as there are in the actual list.
Plus probably a few extra for that buffering so you don't reallocate like I talked about, but what you're actually gonna do is have a bunch of little objects that are being pointed to by the parts of the list.
And if you have a class, you're gonna have fields in the class.
The things that are in there are like strings and numbers and maybe other lists those are not part of that object in terms of how big it is, those are outside of there, pointed to by, pointed to by the variables within that data structure.
So these objects that feel like they're big, often they're many small ones, and if that's the case, all of these little things get stuck into the algorithm applied to the Small Object Allocator, not a big object.
Even though taken as a whole, they might use tons of memory.
Most of the parts will probably still be effectively as far as Pythons concerned small objects.
|
|
show
|
2:13 |
Let's look at one of the red pills in CPython around object allocation.
So this is "obmalloc.c", and if you look here, you can see here's the ASCII art part that I was telling you about before.
This is when the second take that we did on what allocation looks like.
We have physical RAM, we have virtual memory, then we have malloc, then we have PyMem, API, the allocator, but there's something really interesting at the bottom that we didn't talk about then and check that out.
It says "a fast special purpose memory allocator for small blocks to be used on top of a general purpose malloc heavily based on prior art".
And if you want to go check this bit of the source code, this is literally straight out of GitHub, just go to bit.ly/pyobjectallocators, and it will take you right to this line, and you can look through it.
So what's the deal?
To reduce the overhead for small objects, that is, objects that are less than 512 bytes in the "sys.getobjectsize", not the whole traversal, but the small bit, Python sub-allocates larger blocks of memory, and then, as you need it, will free up, or give, that memory to different objects, which potentially could be reused once that object is cleaned up.
Larger objects are just routed to the standard malloc.
But for these smaller ones, which are most of them, the Small Object Allocator uses these three levels of abstraction.
We've got arenas and pools and blocks.
At the lowest level we've got this thing called a block, and then pools manage blocks and arenas manage pools.
So we're gonna go through all of those.
But there's this trifecta of ideas or algorithms that we're going to use to manage, remember, the small objects.
And this little quote right here comes from an article about Python memory management by Artem Golubin, and he's done some fantastic research and writing around it.
So I recommend that you check out his blog.
There'll be a couple of articles that I think I refer to.
Definitely, I've read as researching all the stuff for this course, so check out his article here.
It has a lot of interesting analysis on what's happening and why it's being done.
|
|
show
|
2:17 |
Let's start at the lowest level where the actual objects are stored in memory.
Instead of allocating as we saw at the very beginning of this chapter 17 bytes or 20 bytes or whatever you need exactly for a thing just randomly where you've got a gap in your memory, Python uses these things called blocks.
Blocks are chunks of memory of a certain size, and each block is designated to hold objects of a certain size.
So, for example, we might define a block that holds objects of 24 bytes, or around 24 bytes, let's say.
The places where the objects go are 24 bytes and anything that's between 17 to 24 bytes is allocated into those 24 byte spaces.
And sure, if you've only got 20 bytes you need and you stick it into a 24 byte spot, you're wasting, quote "wasting" 4 bytes, right?
You could have packed it a little bit tighter.
But this algorithm allows Python to create these sets of memory, that it's really easy to allocate stuff into once it's freed up, just un-assign it, but not give it back to the operating system, necessarily.
Then when you want to allocate something new, maybe next time it's 22 bytes, you can use that same little spot and reuse it.
Okay, so that's the idea of these blocks.
And there's some rules.
One of the rules is it only holds, each block only holds things that fit into its block.
So if you've got one, that's the size for a 24 byte element, things of 17 to 24 bites go in there, but they kind of waste the space if they don't totally fit and you can see they're broken into these different categories.
So once of block is allocated, it's always dedicated to its size.
Its either a bunch of stuff that fits into 24 byte pieces or 16 byte pieces and so on.
So you can think of Python allocating these blocks, for of the different size of allocation it's going to do and then be able to just reuse that memory.
It doesn't have to go back to the operating system, free up memory, ask for more memory, get that fragmented on in RAM and things like that.
It can get a whole bunch of space for those pieces of those small objects that it needs and just works with it internally, and it's more efficient that way.
|
|
show
|
1:05 |
The next level up in this algorithm are the pools.
Now pools, their job is to manage a bunch of blocks and a given pool always contains blocks of the same size.
So if Python needs to allocate something that fits into a 16 byte blocks, it can just go to the pool that contains those and ask for it to allocate it, ask for one that has some free space and so on.
The pool size is typically set to match the memory page so it maps well to RAM, it doesn't get fragmented, and Pythons ability to reuse this fixed, contiguous set of memory helps reduce fragmentation that would otherwise happen if we just went to the underlying C-layer and just said "give me the next free bit of memory that you have of this size".
You can look at this a little bit here is the source code around the pools.
It has the next free block.
Right there is a pointer that you can always get you, and it also has the next pool and the previous pool.
So it's kind of a doubly-linked list of pools that within there contain a bunch of these blocks.
|
|
show
|
1:19 |
Let's go back to the CPython source code and look at this idea of blocks, pools, and arenas.
We haven't talked about arenas yet, but let's just look over there real quick.
So I've come up with another Bitly URL for you, cause like, all of these are super long.
So "biy.ly/cPythonpools", if you want to go there, and you might be surprised to see what you get when you come here.
You don't get source code immediately.
There is source code if you scroll enough, like here, there's some source code, but check this out.
There's like a full on essay about what is happening around one of these pools.
Okay, so if we go down a little bit further, it talks about how blocks are managed within the pools.
So blocks in the pools are getting carved out as needed.
The free block points to the start of a linked list of the free block, so you can always just go there and start allocating into that one.
So pretty awesome, right?
There's a bunch of stuff about, like things that they're doing that might be a little bit obscure, or what's unclear, and so why it's happening that way and so on.
So when I was studying this I Was really surprised at the level of detail, describing how pools and blocks interact together.
And if you want to see what the core developers and the people who worked on this and implement it and maintain it, what they say, Well, here it is, right in the source code on GitHub.
|
|
show
|
0:54 |
The last data structure or idea that we gotta cover when we think about how Python is working with small objects are arenas.
And arenas are 256 kilobytes of memory, they're allocated on the heap and they manage 64 pools.
You can see down here the data structure that defines them.
It's quite similar to the pools.
You've got doubly linked list.
You've got the next free one.
Things like that.
These arenas, this is the top level thing.
Arenas contain a bunch of pools.
Arenas are always the same size.
The pools are often the same size, the blocks that they contain, they might be different scale, they might be 8 byte objects, they might be 16 byte objects, though, you know, the second would only hold half as many as before, but that's the idea.
We've got arenas that control the pools, the pools hold the blocks, and the blocks are where the objects actually go with 8 byte alignment.
|
|
show
|
5:11 |
Well, with all this abstract talk about blocks and pools and arenas can be a little bit hard to understand.
So, let's just actually go and see what Python will tell us about all of this stuff and the Small Object Allocator.
So, go here and create an "app_something", Remember?
That means you can run it and I'll just call it "stats".
Real simple, we'll just do our fmain magic once again, we're gonna need to use sys, so there's our sys, Perfect.
And let's you put a little message to see what this is about.
Great, so over here, we can just run "sys.", It's not great, kind of this semi-hidden, but you can still get it..
"debugmallocstats()", like so.
Let's go ahead and run this, see what we get here.
Notice it didn't come up in our auto-complete, Right?
"_debug", no, not there.
But, we tell PyCharm not to obscure it and not to tell us it's not there.
Hey, We can see that it looks like it's okay.
Alright, so let's look and see what we got out of this.
Check this out.
Run it once from scratch.
If we get to the top, what do we got?
We have this "small block threshold".
Remember, things that are managed by the Small Object Allocator, they are how big?
512 bytes or less.
And there are 32 different size classes.
So, 16, 32, 48, 64, 80 and so on, is apparently what we got right now.
There's how many pools of each size of those are allocated and how many blocks are in use within those pools?
so like 5 pools, we've got 558 blocks used within those 5 pools, and we don't need to go allocate more until we've used up 72 more of them.
And of course, all of that is only for 32 byte elements.
All right, so let's scroll through this.
This just gives you all the information about the blocks, their sizes, how many pools go with them and whatnot.
And then here's our arenas.
How many arenas have we allocated?
12.
So that would be 12*256, Should be 3MB areas reclaimed.
3, The max we've had is 9.
Currently, we have 9.
And then it tells you that this much memory is being used right now.
You can go through and you can see a lot about it here, Right?
Then we come down here and it actually tells you what is allocated.
So we have "PyCFunctionObjects", we have 56 bytes each.
We have 5 of those, so that's cool and so on.
Right.
So You can see all the different objects: tuples, stack frames, floats, dictionaries and whatnot being used here.
So you get a lot of operations.
Let's go and make it do something.
So I'll just come down here and say "make it do memory things".
So I'm just gonna copy some code.
What we're gonna do is we're gonna create a bunch of larger numbers, so that's going to make sure they get outside the flyweight pattern and they're gonna actually get allocated every time.
And then we're gonna create a list, then we're gonna put a list which has the string repeated 100 times of whatever this large number is that we get here.
So we're creating a bunch of objects.
Basically, we're doing a bunch of things here.
Alright, let's try this one more time and see if it did anything different this time.
Scroll, scroll, scroll, scroll, where is my separator?
It didn't flush quick enough, did it?
Alright, so we gotta go through the first one, right?
The first time we see 512, there we go.
That's the second one here.
So you would see that there's potentially more of these in use.
Not a whole lot more.
You can see, I think more frame objects were in use here and so on.
Let's look one more time.
Yeah, the data is still there.
Make it a lot bigger.
Here we go, it's slow.
It's doing stuff.
Okay, great.
Now you can see more arenas were allocated.
We had 62, the highwater mark was 34 and we're still there, were using 8.9MB of RAM and so on.
Then we can see our list object and all these different things.
Okay, we have more of these blocks in use than we had before.
So the 64 one's, 1000, 700, and before we had only 138.
So you can see as we do more work, It's consuming more memory, it's getting allocated into the different spaces, and so on.
Not sure how much meaningful information you're going to get from, like, actionable stuff you can do here.
But I definitely think it helps understand some of these basic ideas that we've been talking about and gives you a look at the current state of the system, and that's pretty cool.
|
|
|
42:27 |
|
show
|
3:00 |
Well, we've come to the end of the line.
Not for our course, there's a lot left there.
But for the life-cycle of Python objects.
We're going to talk about cleaning up memory and destroying Python objects that we've been working with, but we don't need anymore.
This is really what most people think about when they think about Python Memory Management.
I think the stuff about allocation is not even on people's radar for the most part, but this is.
This is when we're working with objects and we're working with data, how does Python make it go away without us having to do what C has to do?
Where you have accounting and some part of the program is responsible for tracking when a thing can go away and it's got to make sure it doesn't do it too early, but not too late.
So we're gonna talk about two different techniques in Python that, but let's not worry about that, but understanding it is still super important.
So if we grab our red pill again and jump over into the CPython source code, you can just go to "bit.ly/cPythonobject" to go find this bit here.
Remember, that we saw object.c already.
This is object.h, the header file that defines what is one of these objects.
There's 667 lines just in the header.
This is a crazy thing.
So I've only grabbed a very small bit of the definition of PyObject.
So we're defining a new type, You can see at the bottom it's called "PyObject", we're gonna have a whole bunch of pointers to it, but the most interesting thing I want you to take away in this part is to do with this "ob_refcnt".
So how Maney references, how Maney variables, pointers, remember, Python doesn't have pointers, but it kind of does so, references or pointers point back to this particular object?
and it's a "Py_ssize_t", which is a weird type, basically, it's a way to get an unsigned number, an integer or a long or something like that, that matches the type of architecture you have.
So like on a 64 bit processor it's probably 64 bit or 64 bit version of Python, a 64 bit number or a long, things like that.
Think of it is just a number, an integer, and it just counts: 0, 1, 2, 3, 4, 5 as different variables point at this object, Python behind the scenes will increment this number, and as they stop pointing at it, cause either they go out of scope and get destroyed or they get assigned to another variable or to none or something like that, Python will take this down, and when that number reaches zero, the object is cleaned up.
Boom, That's it.
Remember, it's taken out of its block so that other things can be put into that hole into that block and reused.
This is the primary way, this reference counting idea is the primary way that Python keeps track of things that you're working with and cleans them up for you.
There's a secondary way, because reference counting does have some flaws.
But when you think of Python Memory Management and things getting cleaned up, this should be the first thing that comes to mind.
|
|
show
|
9:03 |
Alright, we're gonna do some fun programming over here, and it's fun because you've got to be a little bit clever to work and understand reference counting in Python, because if you create a variable and you point it at a thing so you can ask questions about it like, "is it alive or not?", it's always gonna be alive cause you're giving at least one to that reference count.
So we're gonna use some cool little utilities we're gonna write.
So I'm gonna write one first called "memutil" and I'm going to drop some code in here because we don't need to do a whole lot here.
So we're gonna use this "ctypes" thing, which actually allows us to get kind of a direct reference to PyObjects, that's the same one we talked about, and then give us the field reference count.
Now I'm going to add that to the dictionary so it doesn't look broken.
It's going to come back as a long, like I said, and then what we can do is if we get one of those id's, remember, id of thing, we can come down here and use this, say parse it from that address, and then we're going to get the reference count.
Okay?
We're gonna use that.
Then we're gonna have our "app_refcount" That's the one we're gonna run.
I'll go ahead and set it to run now and I'll just hit the hotkey, we don't need it that big, do we?
There we go, and I'll do my fmain magic, so it's ready to run.
Now we're going to need something interesting to work with.
First of all, we can work with the garbage collector.
We're not really talking about the garbage collector, and I only want to work with it to the extent where I say "let's not have it do anything".
Let's just disable it for now so we don't think about it or worry about it because what's happening has nothing to do with the garbage collector.
Later we're gonna enable it and focus just on its role in this whole world.
We're gonna create some variable, call it "v1" and let's just give it the number 7 and then we'll have, I'll call it "oid" for object id.
It's gonna be the id of v1.
Now what we need to do is we need to grab this early so we have the id.
This number knowing the memory address that the location given to us by id will not keep this thing alive.
Right?
So it could be potentially up for collection.
And we could still use this to ask from our memutil here.
So we'll say "import memutil" as well.
We're gonna need that, but in order to actually track this really clearly, we need to do one more cool thing.
So I'm gonna create an object, a class, and I'll call it "doomed".
Why is it doomed?
Because its only purpose in life is to get created and then get destroyed by the reference counting cleanup system.
So let's go in here and find a class "doomed".
Now, this is going to basically plug into the Python data model, the "dunder methods" to capture different lifetime events of this object.
So we'll have an "__init__", and this is going to be when it gets created.
Let's go ahead say that it can have some friends.
I don't think we're gonna use this yet, but this "*friends" means this will come in as a list and then we'll just print.
We'll come over here and say "at {id(self)}".
So we created this doomed thing wherever it happens to be now.
That's what happens when the thing comes to life.
And then we're gonna have another one run when the thing gets deleted.
Then we wanna have some string representation of it so it's easy for us to just print it out and see what's going on and we'll just do the id there, but we want to have a string Representation so we'll have a "__str__" and we'll return a "__repr__" as well we'll just make them be the same.
We're just gonna paste some code so you can see what's happening.
All right, so what we're gonna do is we're gonna say "there's a doomed object at this address and it has however many friends if there are friends, Otherwise it's not going to talk about his friends".
Doesn't want feel bad.
So this is gonna be are doomed object.
We're going to create one of these over here in our code.
Instead of creating 7, I'm gonna create a "doomed" and it's going to start with no friends.
We'll come back, the friends is more important in the garbage collection side of things.
But let's just run this and see what happens.
Notice it created a doomed object at some address and then it deleted the doomed object at that address.
And that delete was when this function returned, no more things pointed at it, and it went away.
But we can be more interesting than that.
Let's go see what else we can do.
Let's start by first printing out how many things refer to it.
So let's say "print", and then here we're gonna use our "memutil.refs(oid)".
Now we run this.
You might think it's zero right, cause we're kind of done with it, but it doesn't get cleaned up explicitly.
Doesn't get cleaned up unless we explicitly do so until line 13.
So it should still say "1".
There we go.
Step 1 ref count is 1.
Now, let's go to step 2.
Step 2 is we're gonna have another variable that is equal to v1.
So remember the way this works is Python sees this and says "there's a new variable defined that is now pointing over there", and so we now have v1 and v2 pointing at our doomed object, and so now it's gonna have to increment that reference count.
Let's try that.
Sure enough, reference count is 2.
Alright, let's do some more things.
Let's go over here and do this again.
Step 3 is we're gonna change where variable 2 points.
We're going to tell it to point at none, but it could also be at 7.
It doesn't matter.
If it's just not pointing at the doomed thing anymore, that's going to decrement, take away one from the reference count.
Can you see it went "1, 2, 1", okay?
Pretty cool.
And the final step, step 4, let's go and tell v1 it also no longer points at doomed.
And at that point, nothing, not v1 not 2, nothing else should be pointing that v1 and so it should clean itself up.
And let's do a "print('End of method')".
Right?
So we should actually see this go to zero.
We should see it get cleaned up here, and this should return zero, and you should see all that before it's getting cleaned up naturally as part of the method return.
Are you ready?
Let's see what we got.
Here goes.
Beautiful.
Okay, reference count is 1, then it's 2, then it's 1, and this line, you can't really make it happen all at once, but this line 19 is what is causing this right here.
When we say "the last thing no longer points here" immediately, like on line 19, basically, this thing is getting destroyed, the memory is getting reclaimed.
And then later we can say, Well, now how many things point at it?
Nothing.
But that's already been the case on line 19 which did the cleanup.
And of course, that happened before the end of the method.
And if we don't do this one, you'll see the cleanup doesn't come until after the end of the method.
So that's pretty cool.
And I want to emphasize this is not non-deterministic.
Rather, I should say this is deterministic.
It will always, always be the case that on line 19 this will get cleaned up.
Run it again.
You can see it definitely ran between step 3 and 4, the destructor, if you will, of our doomed object, right?
And it said "hey, I've been destroyed, I'm gone, I've been deleted by memory management, either reference counting or the GC".
This is really important.
This is incredibly lightweight.
All you have to do, all Python, rather, has to do to implement this is just to keep count of how many things point at it and when some variable changes assignment just increment or decrement that number.
If that number ever hits zero, immediately take it out of the block and tell the block that spot is now available again, right?
You don't even actually have to clean up the memory.
So this is really, really efficient.
In the deterministic part, a lot of languages use garbage collectors as the primary way of cleaning up their objects, you know .NET, Java, those types of things.
And because of that, it's non-deterministic.
When the garbage collector runs is based on the behavior the program has had over time.
You can't say on line 19 this thing's gonna get cleaned up.
You can say "well when the memory kind of gets full enough and the heuristic decides that section of memory is worth looking at again, then it'll get cleaned up", and that could be problematic for real-time things, like stock trading, that has to have no latencies of, like, 4 milliseconds or whatever it might turn out to be, right?
So this deterministic aspect of reference counting is really nice, because it's going to behave the same way, memory wise, every single time.
|
|
show
|
3:36 |
Well, you heard me go on about the benefits of reference counting.
It's fast, It's lightweight, it's deterministic and yet we saw there's this other thing called a GC that must be doing something, a garbage collector, and it wasn't involved in reference counting.
So what's the deal?
Where does reference counting fall down?
Where does it break?
Reference counting is excellent, but one thing it cannot deal with is cycles.
You have one object you create and then that object refers to another and then some other part to another, which links back to itself, which might create some kind of cycle.
You're gonna end up in a situation where the reference count can never go smaller than 1, so the object will never be freed, and it'll be leaked.
Memory will be leaked and it won't be great.
There's some interesting things you can do around that for performance, but first, let's just look at the problem.
So let's start with some simple code.
We have our person that we created, and we can add friends to them.
So we're gonna create one person whose name is Michael.
There they are.
They're out here in memory.
So person name is Michael.
They have some friends, there are no friends in there yet.
Create another person.
Her name is Sarah.
She's out here in memory, and she has no friends either.
But notice each one has a reference count of 1, and 1 because p1 points at Michael, so that's 1, p2 points at Sarah, so she gets 1.
However, Michael and Sara are friends, so we're gonna go over to Michael, p1, and say "add to the friends, or appended to the friends list, p2, that's Sarah".
So that means that Sarah is one of Michael's friends.
Sarah, being a lovely person, wants to reciprocate that and says, "hey, I'm also a friend.
Michael is my friend because I'm his friend", right?
So in the same way, we're gonna put Michael into Sarah's list of friends.
And now look, each of them have 2 reference counts.
And now we decide "hey, we're done with p1", and that's going to take away this link, and the reference count for Michael is 1.
The 1 comes from Sarah and her friends list, pointing back, and guess what, we're done with Sarah as well, so we're gonna take away that link, and her reference count goes down to 1 as well because Michael is one of her friends, and she's in his friend list.
So look at this.
We're in this situation where there's no more variables pointing at either Michael or Sarah, and yet their reference count is 1.
What action could possibly happen in this program that will make that go to zero, either for Michael, so that he'll get cleaned up, which will take away the reference count of Sarah and get her to clean up or in reverse?
Well, there's nothing left pointed at them.
No one can manipulate the friends list because no one even knows about these variables anymore.
This is just a fundamental flaw in reference counting garbage collection.
If you end up in a situation like this, you're done.
Those things will never, ever be cleaned up.
I guess you could like C++, remember to always break the cycles.
But that's not gonna work, right?
It might not be this simple.
There could be, Michael has some other friend who has some other object which holds on to other people who then hold on to Sarah, who happens to be one of Michael's friends or, you know, something like some big, long, complicated chain.
It's not a 2 person linked cycle.
It could have many, many links in that cycle, and it could be really hard to understand.
So these cycles, that's what fundamentally breaks reference counting.
You can see, we've even set both the variables to none, and there's no mechanism for cleaning up Michael or Sarah because they're in this, like, locked bit where you've gotta wait for one to go away to get to the other, but that's never gonna happen.
|
|
show
|
5:21 |
Let's look at this cycle thing in the context of our little reference counting app that we've done, and I'm just gonna print out of the top like this running reference counting demo, cause we're gonna have another kind and these are gonna be super similar so let's go over here and just make a copy, say, this is going to be "gc", and this is gonna be the "gc demo", and this is going to explicitly enable it.
We don't have to do that, but I explicitly disabled before so let's make sure of whatever else happened, It's in there.
So we're gonna do kind of what I talked about in the previous videos, we're gonna have 2 things, person 1, person 2, and We're gonna say "v1.friends.append(v2)", and vice versa, right?
These are 2 people who are friends.
Let's go and tell it that this is alright.
We're going to tell it "this is a list of friends".
There we go.
Make sure it's a list.
Now, we want to have the id.
So let's have this "id1 and id2, v1 and v2" just for the same reason we needed to keep track of those.
So, id1, now let's change this to "counts are", and id2.
We don't need our step 2, well, we're gonna change what step 2 means.
Let's go and actually set "v1 = none", and "v2 = none", and then we'll put our reference count out again, like that, setting it to none twice is not gonna do anything.
Let's set it to be the end of the method and see when things get cleaned up.
Maybe one of these will go away.
Maybe both.
We don't know.
Let's find out.
So we're gonna run this one.
Well, look at that.
So we've created a new object, doomed1 and doomed2, those are Michael and Sara, for example, and we had up here, I'll make this more legible for you, after this, we had one pointer to each one reference count, now they're friends, so they pointed at each other, that incriminated it again, and you can see now the reference counts are 2 and 2, and we threw away the original variables and went back to 1 and 1 but they're never going to get better than that because of the cycle.
So we went all the way to the end of the method and they never got cleaned up.
What happened?
Well, "gc" is non-deterministic.
It's based on how the program behaved.
We're gonna talk about that in a minute.
What are the gates that it has to go through and where the rules it uses to decide when to run, what objects to look at and so on?
But the short version is just cause these are free, the garbage collector is not constantly running around.
It can only run so often, otherwise things would be super inefficient.
So let's go and make some stuff happen.
Both to step 3.
Down here I'm gonna just, kind of like we had before, just make a whole bunch of allocations happen, and the fact that these are lists and other things that can contain data is actually important.
OK, like this isn't enough.
You need stuff like this.
Anyway, what we're going to do is we're going to allocate a bunch of things, and if there's enough allocations, potentially, it could trigger the garbage collector to run.
So we may see cleanup before 3, or maybe not.
Let's find out.
Look at that.
So over here we created the objects.
We had originally 2 variables and then the links to each other, so that was a total of 4 (2 and 2), and took away the variables down to 1 and 1, and then eventually after we did a whole ton of allocation, thousands of lists and then multiplying that and right tons of stuff happening here, enough allocation and memory pressure was put on the system that it said "OK, OK, OK, we need to slow up for a minute and run the garbage collector, see if there's any garbage".
It found that cycle that we were talking about, that we created there to doomed objects and said "you know what?
This is a cycle.
It's out", threw it away before this line happened.
So it was the behavior of all of this allocation and tell you what line or whatever.
But somewhere in this looping around, making a bunch of stuff happen triggered the GC to run, we'll talk about what the scenarios are for that, of course, And then it did a collection, it found the cycle, it deleted them both and then we carried on right?
So if I, again, if I comment this out and I run it, you'll see the doomed deleted at the end because it was at the end of the program.
But if I do a bunch of stuff, it gets deleted along the way because eventually there's enough memory pressure to trigger the GC to go look around.
So, unlike reference counting, GC is not deterministic.
But it's not as important because it only applies to objects that are put into a cycle.
And in fact, the only place objects that can contain other objects, right, like a list and contain other things or a class can contain other things.
But a number, a string, all those things they are always, always reference counted.
It's just the container objects, if you will.
We're going to talk a lot about the algorithm, but this is the basic way in which it works.
|
|
show
|
3:56 |
Let's look at our GC example just one more time.
There's one thing that I think is worth exploring on its own.
you'll recall, what we did is we had these two objects here, we create these two doomed and we said they represent people and people have friends.
So what we're gonna do is we're gonna say person1 is a friend of person2, and person2 has person1 as a friend.
And then we're gonna blank them out, and because these references point to each other, we saw that they're not going to be deleted or cleaned up by reference counting, they have to use garbage collection.
So over here, we had put this back the way it was.
Here we go.
We had this and we did a bunch of allocations here, created a list, and by doing all this allocation, we created enough objects that the garbage collector had to run, so between step 2 and 3, you can see these doomed objects were collected, the cycle was detected, they were GC'd, everything was good.
That seems straightforward, right?
Let's make a very small difference here.
Let's suppose instead of creating things that can contain other things, like lists and so on, well, we're gonna do is we're gonna go down here, and actually that may make a copy so you have both in the source code to play with.
We're gonna go down here, and we're just going to allocate some variables.
We're gonna allocate about the same amount of stuff here.
We're gonna create X, which is whatever the range is squared, and then we're gonna create a string of it.
And then we're gonna create, replicate that 100 times within that string and store that in Y.
Now, never mind that it says it's not used in PyCharm.
It's still used as far as the garbage collector is done until we get to the end of this function, okay?
so, let's run it.
Wait a minute.
What just happened?
Look where the stuff got cleaned up.
For some reason, even though we're doing basically the same stuff, I mean really, really similar as before, we're not getting a garbage collection triggered.
Why is that?
Well, there's something super subtle going on here.
Over in this version, we're creating container objects.
Lists.
We created a class that would work.
If we created say, a tuple, it could contain other things.
What we're doing here is we're creating objects, here a list, which has a bunch of strings and so on, and then we're putting it in the list that technically doesn't really matter.
But we're creating objects that could potentially have cycles.
Like a list containing an item which could then point back to the other list and things like that, right?
These container objects: dictionaries, sets, tuples, classes and so on, they potentially can contain other objects so they potentially could contain a cycle.
But down here in this one instead, we have numbers, we have an "i" which is a number between 1 and 1000.
We have "X", which is the square of that number.
We have a string created from X, a temporary one, and then it's created, replicated 100 times, and we store that in "y".
Can any of those hold other objects?
Create a cycle to them?
No.
They're all immutable, actually.
Right?
So the things we're creating are not of interest to the garbage collector.
So when we talk about allocation and the garbage collector even paying attention to it, what you're going to see is that the garbage collector doesn't track everything that Python works with or does.
The garbage collector pays attention to container objects: dictionaries, lists, classes and so on, Okay?
So the things that are even eligible for garbage collection are a massive subset of what you generally work with in Python, and it's super important to realize what triggers and interacts with the garbage collector and what doesn't.
|
|
show
|
2:42 |
We've seen this mysterious GC, this mysterious garbage collector in action, but we don't really know what triggers it.
We saw that it somehow has to do with container objects versus non container objects like lists and dictionaries rather than strings and numbers.
So what we're going to do in the next couple of little sections here is talk about how this works.
So Python has what's called a "Generational Garbage Collector", and this is a really important optimization that most modern garbage collectors have.
And the way it works is all the objects start out in was called "Generation Zero".
It talks about how many times have they been collected, in this case never, and then once an object has been inspected but not deleted, it was inspected, and it decided that it was still alive for whatever reason, it's promoted into Generation one, and then sometimes we'll see about 1/10 of the time, the stuff in generation zero and generation one will be compared.
Most the time we'll just focus on Generation zero, and that's because the overwhelming pattern is that new, temporary little things are created for a moment and then thrown away.
So objects will live very briefly.
So typically it's enough to just look in generation zero, but we'll see the mechanism by which Python determines it needs to look broader.
So it'll look at maybe generation one and zero items if it needs to look a little broader and if it finds something in generation one that is still valid, but it has been inspected twice, It's now sent to two, and same thing applies.
Sometimes Gen 1 and 0 will be inspected, but rarely will Gen 2.
So these things are even older and even more infrequently inspected, and we're gonna talk about the ratios and how you can see and configure that and all in just a minute.
But by default, things start out in generation zero.
If they happen to survive a garbage collector run, an inspection, then they get promoted and they get inspected less often.
If they survive a second or further time, they get promoted to generation two, and over in generation two, they don't get inspected nearly as often.
Remember, reference cycles only occur in container objects: lists, dictionaries, classes and tuples, and so on.
For that reason, only those objects are subject to Python's garbage collector, right?
We saw that we worked with strings and numbers and so on, No problem.
They did not interact or trigger the garbage collector.
When you think about all the stuff and all these algorithms, it's easy to think "Well, every little thing I do in Python has this applied to it on top of reference counting".
No, just container objects.
So that's worth keeping in mind, too.
|
|
show
|
4:43 |
Now we've seen what this garbage collector is and how it restricts what it pays attention to, both the types of objects that it pays attention to and the frequency which it checks different ages or generations of objects.
But when exactly does it run?
Is it some odd heuristic that we don't really know about?
Or is there something more concrete?
Well, Python, unlike a lot of different systems, is pretty straight forward in what it does.
So we can actually call this function "gc.get_threshold()".
You'll get a tuple of numbers back.
You're gonna get 700, 10, and 10 in the current system.
Now what's confusing about this, what's unclear about this, unless you go look at the documentation, this is not how often Generation zero runs versus generation one versus generation two, the units on these things are different, okay?
So the first one is different than the other two.
The first number, the 700, This is a generation zero collection, so one of the cheaper, easier collections.
This is triggered when the number of allocations surviving reference counting minus the ones that have been cleaned up exceeds 700.
That's what that 700 on the left means, it means we've allocated 700 more things that have lived even when you take into account the ones that have been deleted.
Now Remember, this is not all allocations.
If that were the case, this wouold run like crazy all the time.
This is going to run only when you've got container objects: classes or dictionaries and so on.
If you create, let's say, 800 classes and only 50 of them get cleaned up well, that could trigger a garbage collection.
Okay, so that's this first number.
That's the 700.
The second number says we're going to trigger a Generation one collection This is a little bit broader search of the memory space and can be more expensive because we're looking at more objects potentially.
Now, this number 10 here means we're gonna do a Generation one collection for every 10 generation zeros.
To write the generation zeros on, we do that every 700 extra containers, and then one in 10 of those we're gonna actually look a little broader, and we're gonna get a generation one.
So that's what this 10 means.
From generation zero to generation one, the ratio of those is 10 to 1.
If we look at the final 10, this is when a generation two collection is triggered, and that's when the number of Generation one collections is greater than this number.
So for every 10 generation one collections, there's a gen two collection.
So that might sound a little bit complicated, but let's break it down.
For every time that we have this exceeding of some fixed number of allocations of container objects, 700, that's going to generate a Gen 01, and then for every 10 generation zeros Let's say, we're have 1 generation one collection.
And then every 10 generation one collections we're gonna have a generation two.
So it's a 1 to 10 to 100 from Gen zero to gen one to gen two number of collections, and it just happens to be the thing that starts it all off is the number of allocations of surviving container objects like classes and dictionaries and so on.
All right, Hopefully, even though that's a little bit of a lot to keep in your mind it gives you a sense of what's going on here.
Like, when is the this whole process going to get started?
It's pretty easy to see we've got 100 to a 10 to a 1 ratio for all of these in the generations for the GC.
It's the allocation thing that kicks off the base of that whole process.
That's a little bit unclear.
It's also worth pointing out that there's a "gc.set_threshold()".
So if you want to change these, you can.
Personally, I haven't really tried that, but it seems like there might be some pretty interesting performance benefits you could get.
You know, I'm thinking of things like I go to a SQLalchemy model, and I do a query against the database and that is going to return 1000 records, that's gonna trigger a garbage collection.
But, you know, it's very unlikely those things are gonna have a cycle.
They were just created, right?
So, you could do things to say, maybe be in those situations where cycles are very rare, you could kick up that base number.
It seems like something you might be able to play with.
We're gonna look at some examples of people doing way more insane stuff than that, but, it seems like playing with that base number, you could probably get some pretty interesting performance benefits or maybe drawbacks.
It depends, but you could definitely make a big impact by changing that first number, that 700, to something bigger or smaller.
|
|
show
|
6:26 |
If we go over to the CPython documentation for almost the latest version, we're doing 3.8.5 but here's 3.8.4 and you look up right at the top of the garbage collector module, you're going to see some very interesting stuff that I've highlighted here.
It starts out telling you about it.
It provides an interface to the optional garbage collector, and it provides the ability to disable it, to tune the frequency, which is what we just spoke about, add debugging options and so on.
It also lets you ask which things are unreachable but cannot be freed for various reasons and so on.
But check out this underlying thing.
You can disable the garbage collector if you're sure your program does not create reference cycles.
Automatic collection can be disabled simply by calling "gc.disable()".
How crazy is that?
So if you have a program that you're sure doesn't create cycles or honestly, if it doesn't create too many cycles, right?
If this is like a command line script and it runs for two seconds, it doesn't create a lot of cycles even if it leaks a little memory, who cares?
if it's a Web server that runs for ever, maybe that's a problem.
Maybe not.
We will see.
But to me, it is super interesting that this garbage collector is considered optional and that right at the top of the documentation, it's like "you know what?
If you're feeling confident, turn it off, you might not need it".
So that's pretty cool.
It also says here "to debug a leaky program call "gc.set_debug(gc.DEBUG_LEAK)", and this includes DUBUG_SAVEALL causing garbage collected objects to be saved in some place in memory rather than been cleaned up for inspection".
So that's also interesting, but this ability to disable it, this is intriguing.
I wonder if you could do it.
If anybody would, What would the outcome be?
Well, you might have heard of this place called "Instagram".
I think they do something with photos.
They actually do an insane amount of work with Python.
All of Instagram runs on Django, at least their back end API's and their website and so on.
And I think they're one of the largest deployments of Django in the world.
They've got a massive set of servers and so on.
They wrote this article over on their engineering blog, which they have a lot of cool Python stuff They talk about, called "Dismissing Python Garbage Collection at Instagram".
And it's pretty intriguing, it says, "by dismissing the Python garbage collection mechanism, which reclaims memory by collecting and freeing unused data, Instagram can run 10% more efficiently".
Yes, you heard it, by disabling GC entirely, we can reduce the memory footprint, not increase it, reduce it by, I think they said, maybe 25% or something like that.
Quite a bit, and improve the runtime performance by improving the CPU LLC cache hit ratio, and you want to know more why you can check out this article here at the bottom.
It's probably better just Google it.
It's one of these yucky medium URL's, but nonetheless, quite, quite interesting.
So here's the DLDR version.
So they were able to determine that the way web servers work is they'll create not just one version of the server for running your Python code but they'll make many of them.
So for example, at Talk Python, we use micro-whiskey, and when we run it for the training site, we actually have eight copies of that process running.
Eight independent, separate copies of the Python web app that you very likely are using in some form or another right now.
There's a lot of memory that's shared between those things and the operating system is pretty good at saying "we're only going to consume more memory for all these different processes if they're going to start changing it, but if it's actually just the same, let's just point them all at the same bit of memory" okay?
When that's the case, you actually get a lower memory usage because even though we have eight processes, instead of having eight times the memory, we might have, you know, 10% - 20% extra memory that has to be created, and 80% could be shared.
I don't know if that's the actual ratio, but you know, that's the general idea that a lot of the core startup runtime bits are all the same, and then there's what that process is done this particular time since it started.
Okay, so what they found by disabling the GC, it was actually mucking with the memory in a way that would actually not allow that memory to be shared by the operating system.
So even though they may have had a few cycles that created, you know, some issues for them, what they found was they got, I think, they said 25% reduction in memory usage.
So they saved 8GB per server by turning off the GC.
And also because the memory is more similar across these different processes, it's more likely that as different processes process requests, that data is going to be in the cache, the CPU cache, and CPU cache access is much faster.
So a rule of thumb might be "if I'm going to read from disk versus something from memory, could 200- 400 times faster to read it from RAM than it is from disk".
So obviously you think something in RAM is blazing and something on disk, even a fast disk, is relatively slow.
Same for the cache, though.
That cache is like 400 times faster than RAM.
So if you can get more of these cache hits, you can get your code to run much faster.
So they came up with this 10% number, sort of like a CPU performance boost plus memory reduction so we can run more things on the same server and so on.
The number is not really important, the general idea is, and the fact that they were able to apply this is pretty interesting.
But if you want to do this, you should read the article because it's not straight forward how they did it or whether that worked well for them.
So, depending on what you're trying to do, it might be as simple as calling "gc.disable()", but read the article and you'll see there's actually more to what they had to do in their fairly complicated set up.
All that said, I'd probably leave the garbage collector on, maybe turned that first number, that 700, up much higher.
But I know that it's kind of my first impression as I'm thinking through these problems, But, you know, this is one of the types of things that maybe let it just work the way it is.
But if you feel like you could give this boost, these are some of the knobs and ideas that you can play with to improve it pretty easily around garbage collection and memory management.
|
|
show
|
3:40 |
If you thought about multi-threaded programming or parallel or asynchronous programming in Python, you've surely heard about this thing, the "GIL", or the "Global Interpreter Lock", and you often hear this presented as a negative.
What this thing does is it Only allows a single Python instruction to be executed at any given moment, no matter how many threads you have or how many cores you have, none of that, Python instruction one at a time.
So that could be very limiting for how parallelism works in Python.
Now there's a lot of situations where it works awesome, actually I go through it in my asynchronous programming course.
But things like "I'm waiting on a network connection from a response from a Web service or some kind of API I'm calling", that doesn't count is executing a Python statement, that's down somewhere in the OS.
So if you're waiting on other things, actually the parallelism is great.
But for computational stuff where the computational bits are in Python, this GIL is a big problem.
Actually, Eric Snow and some of the other core developers are working on PEP 5.5.4 which will create what are called sub interpreters or multiple interpreters that take sort of a copy of everything we've talked about and replicate separate, isolated versions and each version can run on a thread and it'll potentially solve or alleviate some of these problems by not sharing objects.
That's in the future.
I have a lot of hope for it.
It sounds really cool, but I want to just bring up this GIL cause it's actually super interesting around what we've just talked about about, not the garbage collection, more the reference counting side of things.
So let's think about a world without the GIL.
We have stuff running.
We've got these different references to the PyObjects, different pointers that are all pointing back from potentially different threads, and they're coming and going, as they do, multi-threaded, right?
In parallel.
So in that case, you're gonna have to do some kind of thread lock, some kind of critical section or muText or something around that piece of data that is the reference count on the PyObject, every single one, even things like numbers and strings.
You can imagine that's gonna drag down and be super, super, slow right?
That's a lot of overhead for every single normal operation.
So the tradeoff was made that said, Well, "let's only let one bit of Python run at a time", in that case, there's no possibility of a race condition around the reference count, so we don't have to have thread-locking on it and it'll be much faster.
So that's really what the GIL is all about.
People think of it as like a threading protection thing and it kind of, sort of is.
But really, what it is, is a reference counting protection mechanism that allows Pythons reference counting to be done without thought or care of parallelism or thread safety or all those things that are somewhat hard but certainly expensive relative to not doing them, and they can just freely work with these objects and because of the way it runs, you're never gonna get a race condition on that pointer.
That's a global interpreter lock.
When I always thought about it, when I first heard about, was like "this is a threading thing", and technically yes, but really the most relevant part of it is this has to do with reference counting.
So it's a Python memory management feature, if you will.
Actually, you can read more about it over at Real Python Check out the article.
They always have a bunch of good articles on this type of stuff, so they've done a really good one exploring the Python GIL.
You can see the URL at the bottom.
If you want to learn even more about it, go check it out there, but keep in mind when you hear about the GIL, it's actually there to make reference counting much faster and easier.
Yes, it's a trade off, but it's an interesting one to consider, and a lot of times, if you're not doing parallelism, you're better off because of it.
|
|
|
47:29 |
|
show
|
1:39 |
So what we've done so far is we've tried to understand Python memory management techniques and how it works.
We've talked about allocation, we've talked about cleanup with reference counting and how they're problems with reference counting.
So there's also this garbage collection thing and so on.
All of that was kind of FYI, so you'd have a better understanding, right?
Well, what we want to accomplish is to come out of this course not just with a better understanding, but with the ability to actually make decisions that can dramatically improve the amount of memory we use, by making it less, or speed up things, or ideally, both.
So we're going to start down that path in this chapter and carry on in the chapters that follow.
So let's start by talking about the goals of this chapter.
To some degree, you're going to see that what we can do will actually make things better.
But also, sometimes it's just easier to go with the grain and understanding how the garbage collector and reference counting and allocation works, you can either go with the grain or you can go against it.
Obviously, working with the way that Python is already gonna work is good.
It's better.
So I want you to keep this in mind as we were talking about all of these things and especially looking back at what we've done, it's important to just know what the direction of the grain is, what the way the system wants to work so that you can not fight it and work along with it.
But again, in this chapter, but especially the next, we're gonna be looking at some actual techniques that will let you change the way things work or maybe do this to an extreme where we actually change the performance in really important ways.
|
|
show
|
0:53 |
There are gonna be a couple of glaring omissions of what we're covering during this chapter.
We're not going to talk about functions and we're not gonna talk about classes.
Not really.
Obviously we're gonna use some functions and we'll probably have some classes.
In fact, I'm certain that we will.
But we're not going to focus on some of the techniques that we can apply directly to them to get huge, huge gains.
In fact, it's because they're so important that we're going to have a dedicated chapter on optimizing functions and a dedicated chapter on optimizing classes.
So I want to just set the expectation that we're not gonna deal with things that maybe address how functions use memory or how classes can be restructured to use memory better in the Python language features.
We will talk about using classes, but not the language features to make it better yet because those were really important and they're coming up soon, but we have some foundational stuff to cover first.
|
|
show
|
1:12 |
I hope this article from Instagram about how they disabled Pythons garbage collection and used less memory and not more made an impression on you.
It sure did on me when I first heard about it.
It honestly kind of blew my mind.
What was it that made this possible?
Why could they do this?
Well, let's go back to the Python documentation.
Right here in the center line bit, once again it says "you can disable the collector if you are a sure your program does not create reference cycles", and I would change that a little bit like "it doesn't create too many reference cycles" because, you know, if it's short lived and it creates a couple, who cares, right?
If it leaks a little bit of memory.
But if, like the core thing you're doing, the core data you're working with require cycles, well, I guess you can't do it.
So what we're going to do in this next section is we're gonna look at a situation that has reference cycles and say "is there some way to restructure the algorithm so we no longer have these cycles in our data?" And then it would be possible to turn off the garbage collector.
Not necessarily saying we have to do that, but we're going to talk about these cycles and basically avoiding creating them, using different algorithms and slightly different data structures.
|
|
show
|
6:06 |
Alright, let's write some code and create some cycles.
Now we sort of saw that in the GC section before, but I want to start from scratch so we have some nice, clean examples to work with, and I'm gonna create another person class here that's similar, but not the same as what we had before.
I'm just gonna paste this because we kind of did do this before.
Over here, we've got our person class.
It's created, it has a name.
It also auto generates its id, and it has a list of friends.
It has this class method that will just auto-increment the id, like 1, 2, 3 and so on, and then finally, this class will tell you if it was cleaned up.
This will tell you that, yes, the cycle has been broken and the thing got cleaned up, or, you know what?
It didn't.
So, this is off to a good start.
The other thing that I'm gonna create is something to play with it.
I'll just call it "app_cycles" following my pattern that we're going to run things that are named "app" and just trigger so I can hit the hockey to make it run and do our fmain live template and boom, we've got something to start from.
So what I wanna do is creates two people, have a person, and the name will be Michael and a second person whose name is Sarah.
Okay, we got are two people here.
Now I want to let you play with it in different ways.
Sometimes they'll have cycles, sometimes they won't.
So I'm gonna ask a question.
I'll say if we'll ask the user "do you want cycles?" We'll do like a "[y/n]?" We'll say "If that is equal to yes, then we're going to create some friends" You know, we'll do that friend thing.
So p1, that's Michael, append p2, and then we'll do the reverse like so.
So if they say yes, that's going to create a cycle.
And then we're going to zero them out here, like that and we'll print out "program closing, kthxbye".
PyCharm thinks it's misspelled, but no, no, K, thanks, bye.
That's a good thing.
And let's just flush it so we can see stuff right away.
Well, let's go ahead and run this and we'll see if we have the cycle, this we've kind of already looked at, but we're about to do something new here.
So we're going to see that if we do create the cycle, this will not clean it up.
So the goodbye people, they're deleted will happen after this.
But if we say no, then they'll be cleaned up right here, Yeah?
let's do that.
Did we will create a cycle?
Let's say no, and person was cleaned up.
Michael and Sarah.
K, thanks, bye.
But this time, if we say yes, the garbage collector does not have enough container object allocations.
Remember it needs 700 and we've done 2.
So that's not enough, so it's not going to trigger any sort of GC.
So the program exits and then basically as it goes out the door, it does its final cleanup here.
Okay, so this is pretty interesting, but I'd like the program itself to know if the cycle is detected.
Now, previously, we use that memutil thing that could tell us how many references were pointing at a thing.
But I want to introduce you to another idea that we can play with.
We could use the other one, but I think we'll learn something here as well.
So I'm gonna create this thing called a "weak reference" and we can go and just say "we're gonna create a weakref" by importing the library and we'll say "refer to p1".
What's gonna happen here is this is gonna retain something that maybe can point to the object.
But it can only point the object if it hasn't been cleaned up.
So previously we could say, Well, "there was this thing out in memory and it created a cycle and it didn't get cleaned up, but we only know that it used to be there".
You can't say "let me access it as if it were still alive" or revive it In a sense, this weak reference will let us do that.
So we'll do the same here, and down at the bottom, we'll do, I guess we'll say the things already closing or whatever, but then we'll say "if we want to check is this thing still around?" The way you do that with a weak reference is You invoke it and it'll return p1 if it can, if it's still around, otherwise it will return none.
we'll say "if either these come back with an object, we'll print..." This, we'll say, if either of them are alive, we're gonna say "that's a cycle because nothing else points at it", and we'll let people know, otherwise, "no cycles found".
So let's run it just one more time with our cool weak reference.
So, no.
So they were cleaned up when we set them to none.
K, thanks, bye.
No cycles found.
Let's run this again.
Yes.
Create the cycle.
Ah, cycles found, cycles found.
And then those things got cleaned up.
Here we are in this situation where we have these person classes, they have friends.
Most importantly, we have these two things and they need to know about each other, right?
Michael needs to know about Sarah.
Sarah needs to know about Michael.
You might think, Well, that's just the way it is.
Is there any other possible data structure or mechanism we could use that's not going to create a cycle?
Remember, these things are going to survive a while.
If there's going many of them, they're going to get promoted into Gen 1 and then to Gen 2, and when those things get inspected, it's going to be expensive.
They're gonna hang on around in memory longer than they otherwise should.
remember, Gen 1 is 10 times less likely and Gen 2 is 100 times less likely to be even inspected.
So avoiding creating the cycle in the first place may be a really good idea.
Alright, So how do we do that?
How do we create this?
Well, the stage is set.
In the next demo we're going to go and use a slightly different data structure that will have the same accomplishment.
Like you could say, a friend of Michael is Sarah, and vice versa without creating cycles.
|
|
show
|
2:17 |
Alright, let's go and change this program just a little bit to work with our version that doesn't have cycles.
I'm gonna have this "app_no_cycles", and let's even print it out.
So we can see here, print "version with no cycles" this is the one that has data structures that use cycles.
Okay, so what we're gonna do is we're gonna make a tiny difference here.
Let's add one more person, a person 3, Zoe, and it'll still ask "do you want to have these people like this?" And we can go, we're gonna change this right here, "TODO: improve this with new data structures".
Let's go ahead and just ask whether people are friends.
And then down here, We'll just say "yes" like this, "Yes if p1 in p2.friends" for the moment.
Again, we're going to be changing this.
"else, no".
like this.
Alright, other than that, that's the same.
And we can also ask this for "is p1 a friend of p3" just so that we have, you know, sort of both cases covered.
Alright, let's run this thing and see how it's doing.
Alright.
Create cycles?
Yes.
Is Michael a friend of Sarah?
Yes.
Is Michael Friend Zoe?
No.
Program closing.
Ah, cycles found.
Okay, that's a problem.
Let's just run it one more time with no cycles.
No.
No friends.
We didn't create the friend relationships and no cycles and so on.
We're also not zeroing out Zoe, so I guess we could go ahead and do that here as well, that way you'll see them all go together.
Okay, Super.
Now we've got the stage set.
How are we gonna change this?
How are we going to store them so they don't create these references back to each other and create these cycles and all the problems that can come from there?
Well, we're gonna use a different data structure rather than storing them on the classes themselves, like we had indicated right there.
|
|
show
|
6:09 |
Alright, we're gonna bring some cool stuff together, and we're going to solve this problem by using a new data structure for our friends.
Alright, so what I want to do is I want to create a data structure whose job is to manage the relationships.
So that thing will exist and it'll hold on to say, p1 and p2, and it will know that we should point from p1 to 2 or 2 to 3 or 3 to 1, or whatever the relationship is.
And then when we're done using them, we can say "we're done with this person", and it can drop all of those references.
And actually, you'll see, It won't even really need to do that to work correctly.
So I'm gonna call this data structure a "friend map".
Now, at the heart of this is we're gonna have a thing I'm gonna call a "map", but notice that's a function.
So we can call it "mp" or whatever, but I also want it to not actually be directly manipulatable from the outside.
And so we can do that in Python by putting a double underscore and then it'll effectively be hidden.
So this, I'm gonna have some help here, so this is gonna be a dictionary of, given an integer, this is the user id, it's gonna return a list, which is one of these, and same for dictionary.
So given a person id, like 1, 7 or 500 or whatever, return me a list of their friends.
And a dictionary would be fine, but I would like to also have the case when you ask for somebody who we've not entered or dealt with yet, to just say "no, no friends for that person".
So we'll just say "defaultdict" rather than a regular dictionary.
And it's going to be calling the list function when it finds a missing item.
So, for example, nothing's added yet, right?
If I asked for a map of 5, 6 or whatever it is, that's going to return an empty list and I could even append to that without crashing, without a key error or any of that kind of stuff.
So default dicts are cool.
This is hidden, so we're gonna write a few functions to deal with this.
First of all, let's add a friend relationship here, so I'll say "def add_friend, and we'll have a person, a person like this.
We'll go ahead and import them over there.
We'll have a friend, which is also a person".
Super.
So we're gonna say "here's the person, here's their friend, set up their relationship so, you know, whoever that is, is the friend of this person".
Pretty straightforward.
But we want to do a little error checking, so if "not person or not friend" so you can't be a friend of none.
You also, unless your little bit funky, you know, or maybe you always should be.
Whatever, I don't know if you should be a friend of yourself.
Right, you certainly don't need this thing to know whether you should be a friend of yourself.
And then finally, we're gonna have a function down here "def is_friend", the exact same thing here that returns a bool.
Alright, now return False.
You've already set this relationship up, we don't need to do it as well.
So "if is_friend (person, friend)" return.
We don't need to enter them into this data structure twice.
Alright, Now, here's where it gets interesting.
So what we're gonna do is we're gonna say your current friends, the friends of person, not this one, but let's call it existing friends, existing friends or current friends, current.
Current friends.
So this relationship is stored over in the map, we'll go to the map and we'll get the friend id, so a person.id and remember if they've never set this up before, this is going to create an empty list for person.id and then we're gonna add them, or if it's already there, it's just going to return another list.
Now, here's where it gets interesting.
Do we just do this "append friend"?
Well, if we do, we're kind of back in the same situation.
Now, this thing, right, if we somehow forget to clean up the memory here, well, this thing is gonna hold our reference to friend, and it will never get cleaned up.
That would be a bummer, right?
So we're gonna do something a little bit different.
We're going to go and add a weak reference.
Remember, we just spoke about that, but we can use that here like this so we'll have the ability to get all the friends back for the friends who have not been cleaned up or deleted or vanished and the ones who have will be able to test "Oh, that person's already gone.
They don't exist in the system anymore, so they're not really friends" alright?
So what we can do is we can actually store this weak reference.
But if you ask for the friends of somebody, yeah, you'll be able to just give it back.
How cool is that?
Okay, so this is going to let us add a friend and let's go and start by Fixing this here.
We'll say "friend_map" and import that at the top.
And we'll say, "add_friend", so p1 and p2 two and vice versa, like that.
Now, down here, we have to change this.
Instead of the p1 that's in here, we'll say "friend_map.is_friend" we're referencing p2 and the friend would be p1.
So just like that.
Now remember this one always returns false.
We haven't finished this one.
Yeah, p3.
It goes like that.
So, let's go ahead and run it and see what happens.
Yes.
Create some cycles.
Are they friends?
No.
Because we said, just No.
But notice there's no cycles found, okay?
Well, that doesn't really prove anything yet.
We haven't been able to accomplish the same relationships we wanted before, right?
So, we're part way there.
We're able to add these friends theoretically, and we haven't been able to test them, but we're getting there.
So we're gonna have to write "is_friend", and it turns out that this function is not too bad.
|
|
show
|
7:38 |
Alright, well, let's actually implement this "is_friend" thing.
So we actually want exactly the same tests here.
So, if there's no person or no friend, there can't be a relationship between them, so we'll say "return False".
If it's the same person, it's you, Let's say True, False, I don't know.
Are you a friend of yourself or not?
You can decide what the right metaphysical or philosophical answer is right there.
But what we want to do is we actually want to say, "give us the friends of this person" and this is actually going to be a list.
It's gonna come out like so, but not any old list.
In fact, this is going to be a list of "weakref".
So that's what we're storing.
That's what we put in over here.
So when we get it back, it's a list of weak references.
These will not actually keep those friends alive, but we will be able to get them back if we need it.
So we'll say something like this "for f in friends", like so, if I say f or ref, this is gonna be, we'll call it a reference, I don't know.
A little bit complicated.
This is a weak reference to the friend, not the friend themselves.
If we can get a hold of them, we'll do it like this, we'll say, F is this And we'll say, "if there's a reference, gotta check that first, and that the id.." say this is a person, so we get auto complete here.
So "this is an id which is equal to the friend.id".
So the friend we're asking about is in the list of friends because it was added here.
It's still around, right here.
It's not cleaned up, so it still exists in the system, and it's actually the same person, right?
So we're basically going through all the people, return True, return False.
All right, there we go.
Let's see if I got this right.
First we want to create cycles.
Yes.
Is Michael a friend of Sarah?
Oh, yes.
Is Michael a friend of Zoe?
No, they're not, they're not together.
How cool is that?
Pretty interesting.
What we'll see is that these are not actually keeping those objects around, or this actually is not keeping around over there if we clean them up.
Okay, so this is interesting.
you might think.
Well, "okay, Michael, this is way overkill with this weak reference thing because we're just saying True or False, all you have to store is the id of the friend and the id of the person and you're good".
Yes, but in the previous example, I could actually get the friends.
I could say "here is my list of friends for this person" check this out.
So we can still do that without keeping these references, or these cycles, around.
So we can say "get friends of person" and it's going to return a concrete list of person.
So, these are gonna be real people, not weak reference type things.
We'll say "friends is a list of weakref, just like before, equals map of person.id" and we have our cool default list working for us here as well.
Now watch this.
We're gonna say, we're going to return, Let's just make it a little more explicit, let's say "realized friends is going to be p for ref in friends".
Now, what we need to do is we need to check that calling this is not empty, but we can't do it directly here, we might want to return that.
So we're gonna use what's called the "walrus operator".
You can skip that and just ref() this if you have an older version of Python, older than 3.8, but what we're gonna say is "if P := ref()" like this.
So what's happening is we're in our test we're actually trying to realize the variable p here.
If it comes back and it exists, it's not cleaned up yet.
It was in the friends list, and it's still around, then we're gonna actually hang on to that.
So we don't have to call that check twice.
We'll just return a realized friends.
Okay, let's just print out here really quick.
p1.
Let's say yes.
Here we go, look at this.
We have our p1 and I guess we wanted to print this out.
We could even do a little "p.name for in that" to just see who the names are.
So Sarah is a friend of Michael.
Look, we got it back.
And yet, no cycles that were actually created in memory.
How cool is that?
I'm pretty happy with the way this works.
Now, there could be one thing that you might want to do.
We could have set up these friend relationships by adding friends.
I'm going to say we have improved it so I'll drop that.
We could have set this up and then we could be done with people.
Now, because we're storing everything as weak references, it's probably okay, but what's gonna happen is there will be maybe, like, these lingering data structures that we're going to go over, we'll test them and they we'll throw them away, Right?
So when we say "is this person a friend?" let me be specific here.
Like when we're going through and trying to find a friend, we loop over all of them and we try to check for them right here and test it there.
There's gonna be some scruff that builds up.
So we could write one more function and, let me just throw it out here just because it's not super important for this, like a kind of a cleanup thing.
But we could go write an "erase_person" function and it says "okay if that person has not been removed from the map, go ahead and remove them from the map and then go through all of the friends and if that person is a friend, you know, remove their weak reference" Basically.
So that's kind of a long and dragged out thing.
I think it might be necessary in some use cases and not others, right?
As we saw already, because everything is a weak reference, we're not actually creating cycles or keeping things alive.
But this map here could get unwieldy, in which case you might want to write some code like that and call it when you know you're done with certain people.
Alright that's it.
Cool.
Now, I do not want you to think that I believe this is the absolute best way to solve this problem.
There are many ways to solve this problem, and I just want to show you something that's completely different from the default.
What we're gonna do is we're gonna have a person object.
They're gonna have a list of friends, and we're gonna go over here and we have to create this direct relationship.
Instead, we can go use things like, funky things like weak references and other data structures to try to create those relationships and hold those connections, and if either of them happen to go away, well, obviously that connection got broken as well.
And you just say "no, they're not friends" or "no, they're not related".
But we can always re-acquire the elements like we are down here for the things that are not cleaned up.
So I think this is pretty interesting, and I wanted to put it out here as something to give you a thing to think about.
Something to give you an example for how you might approach a problem like this that is sort of non-standard but potentially could be more garbage collector friendly.
|
|
show
|
1:28 |
It's very likely you've got some kind of container for your data.
That could be a list.
We did a bunch with dictionaries just a bit ago.
Actually, dictionaries containing lists, how meta is that?
it could be classes, and we also had classes in there.
So there's all kinds of different containers that we could use.
But there's some that are interchangeable or somewhat replaceable.
I could have a list of numbers that, if a list just happens to have numbers, Python actually has array types, and the more traditional, here's a big chunk of memory in the size of the object or the type of object is allocated and set just in line.
This only works for numbers, but that could be interesting.
You could have Pandas or NumPy.
Maybe those things store stuff really efficiently or inefficiently.
I don't know.
We're gonna find out.
So what we're gonna do in this section as we're gonna look at a simple scenario, we're gonna have a bunch of names and bunch of ages that correspond to those names.
So basically details about people, and then we want to try to store them in different things: lists, arrays, data frames and so on and just ask the question "if we start like this, how much memory does it use?" versus "if it's stored like that, How much does it use?" Because while certain containers might be more familiar and comfortable Or maybe they have capabilities that we need or they're just the first thing that you think of, there might be a much better option in terms of memory for what you're trying to do, and we're gonna sort of compare some of the common ones side by side coming up.
|
|
show
|
4:47 |
Let's explore this idea of container sizes.
So we'll come over here and make a new Python file "app_container_sizes_compared".
Let's go crazy.
Make it long.
Do our fMain magic on it.
Go ahead and set it to run when I hit the hotkey, and there we go.
Okay, so we're gonna do a couple of things.
First, I want to have the day to feel somewhat real.
So I've got some fake names.
Now, we can create fake ages easily, but I want to create some names that are, like, generally the size of real names.
And there's enough variability and stuff.
So I've copied this thing called mock names and check it out.
It's a bunch of names.
It's like 1000 different names.
What we're gonna do is we're actually gonna write a tiny bit of code that's going to take, find all the first names, all the last names and then randomly create other names out of them.
Okay, that should be easy enough to do.
I also want to use that size utility over here.
Remember, we started out with this size_util right there?
To do cool stuff?
Well, in order to use that I need to come over and write something like this "import size_util".
No problem.
It seems fine.
It's fine because I'm running PyCharm and PyCharm knows all of these blue directories, these marked as source roots are candidates for looking for top level things.
But if you happen to use another editor or you just have this directory here, it's going to fail.
So let me give you a little bit of help here.
We're gonna say, we're gonna go back, up a folder, and then add this folder to the root.
So I'll just import these things that we need here.
So this will allow us you size_util, regardless of how you run it, right?
I'm running PyCharm so I don't really need to do this, But just to make life a little bit easier, we're gonna do this, alright?
To make it less error prone.
Next down here, we're gonna go to random, random.
Obviously, this is something going to do a lot with, create a bunch of random things, and we're going to seed it to 42 or pick a random, just some number.
This means it's always going to start from the same place and it'll basically be deterministic in what it generates for us.
So that way we always get exactly the same output in terms of size and name- length and whatnot.
Then we're gonna have a count of how many times or how many names and ages we want to work with, and we're gonna work with 100,000 details around people.
And then I'm gonna need the ages.
So I'm gonna have a function that says "generate_ages" and it'll pass the count.
Now this, this one's pretty easy.
We want it to generate the ages.
How are we gonna do that?
Well, we're just going to return a little list comprehension here.
We're going to say "return random.randint between 18 and 100, for nothing in range from zero to count".
That's it.
And let's just print out ages really quick.
Print..
Maybe not all of them, let's print the top 10 with an "s" There, there we go.
Those look like realistic ages, Right?
Okay, so we're generating the ages and let's do the same thing for names.
Let's be explicit.
This is a list of "int", and this is going to be a list of "str", and we'll write "names".
Now, generate_names is a little bit more complicated.
It's not really that hard, but it's also entirely not the point of what we're doing.
So I'm gonna just drop this in here and you can see what we're gonna get is, I wrapped that around so it fits a little better.
So what we're gonna do is we're gonna go read this and go line by line, pull this apart, and we're just gonna give back the name.
And then for each one of these were going to randomly select one of these out of the full names.
I guess we're not breaking it apart.
We're just gonna reuse some of the names, but there's 1000 of them, so that should be enough.
Okay, this really could be just a straight comprehension, but that's fine.
We'll just leave it like it is.
And let's do the same thing and print out "names" just to see that we're getting some.
Yeah, Katheryn Orknay and so on, they all look decent to me.
All right, well, now things are set up, everything started.
What we have to do is test the container sizes, use various containers.
So we're going to go and use this "store_ages" in different ways, potentially store names in different ways and so on, and then when I ask the question "well, how much memory does that take, for storing them in classes or for storing them in data frames and so on?"
|
|
show
|
1:55 |
Well, we actually already have the data loaded.
We have ages.
We have names.
And if you want the name and age of, say, the fourth person, that would be ages of 3, and name or names like that.
In a sense were storing it.
Let's ask, "how much memory are we using here?" So let's have a name_size.
We're gonna actually compute how much the size is first, and then gonna put that in the printout because I want to get the total size.
These are sort of two things that we're going to keep track of.
So let's say "print, storing them just as lists".
This would be our size_util, and we'll say "get.full.size.of", remember the sys.getobjectsize is not sufficient, especially for containers.
We'll have ages, Ah, I said name didn't I?
Names.
And then, let's do up here age_size, this will be "ages", and then "total", like this.
And let's just print it out.
Alright, we're gonna convert that to kilobytes and not show decimals on the end, so it's easier to think about.
And, yeah, put a little tab so they hopefully lineup the same here.
Names, name_size, and we got the total right there.
Alright, well, let's give it a run and see what we've got.
Well, there they are.
You know, it doesn't necessarily mean a lot at this point because we don't have any alternative to compare them to you.
We don't want to say "well, lists are more, or they're less than this other way of storing them" What we find for storing 100,000 people, their names and their ages is gonna be about 1.6, 1.7 MB using just straight lists.
|
|
show
|
3:47 |
Okay, Well, lists were interesting.
Let's try to actually store them in classes.
We're gonna put the classes in a list as well.
But what if we pair them together?
This is really cumbersome to have.
One list that has ages and one that has the name.
You've gotta use an index of find the same person.
Well, that's yucky.
So, let's go and try something better.
Let's go over here and we're gonna just maybe make our situation as good as possible.
So we'll have a "PersonDetail" and, oops, I dind't want call it that one..
"persondetail" we'll do like that and then we'll have down here a class, like this, and it's gonna have not much going on with it.
It's gonna have an age, which in an int.
Maybe the name should come first, name, str, Pycharm has this cool feature that will add these fields here.
It still wants me to close that out.
Then I'll do it again, like that.
So we can just create a bunch of people here and let's say "print storing them as classes".
So we're gonna do another cool thing here, we're gonna say "I want to create a person detail for name and age", and we're gonna use this thing called "zip", which I think we used before.
But we're gonna use zip to take these two together and pick them off one at a time.
So "for n, a in zip(name, ages), like that.
Names and ages.
So we'll go through them all, and then we'll pair them up and we'll use them to create a person detail.
And then let's just print just for a second to see this is working.
Let's print out the top 10 and it's only gonna work if we have some kind of representation.
I think we need this one for what we're about to do.
Then we'll return self.name is self.age Something like that.
Here we go.
Catherine is 99.
Gayomali is 32.
Freeman is 21 and so on.
Alright, well, it looks like we were able to create the people, but now the question is how much memory did that use?
So let's take something like this and just print it out.
size_util.
get.full.size, people.
Now, It's super important that we have that recursive thing here right?
Because we're holding people, but then the people hold all sorts stuff, and we'll just put out the people_size.
What do you think?
Gonna take more or less than just storing them directly in a list?
I'm gonna guess more.
As in, well, 10 times more or something like that.
Close.
So we're using 15MB and maybe it's not called ages.
Person details, Let's go with that.
Oh, it should say class.
There we go.
So it lines up with the tab.
Yeah, that's a lot more.
Having objects is super, super valuable.
They can contain data and behaviors, and group them together and are great for thinking about problem solving.
And they're great for isolation, data protection.
They're not great for memory, okay?
So if that's your most primary concern, maybe it's not, Maybe it is, but we are using 10 times as much memory for basically storing the same thing.
Again, they are matched up, which is nice, but maybe we could do something better.
I don't know.
It depends how much memory pressure under, but understanding this takes a lot more, and that's good to know.
|
|
show
|
3:08 |
Sp lists seem like a clear winner, maybe, but remember, lists actually are super inefficient.
They've got their contiguous structure to hold the pointers, but then they point out at pieces of numbers in memory.
And remember, those numbers are like 28 bytes a piece, where storing most these numbers, they're ages, they're not very big at all, so we should be able to do better.
And if these numbers were larger, still, remember, we're working with numbers that are within the range of that flyweight pattern.
So it's probably not as big of a deal as It could be.
I'll maybe tweak that just a little bit to see what we get in terms of if it were different types of data.
But what we do is we're going to say we're "storing them as arrays".
So this is not a common type used in Python, but it is an interesting one.
So we'll say "ar = array".
Now, this is a type that we can import like this, and, array, and then you can give it a type.
See that type code?
It can only hold one of the same things.
If we want it to hold integers, we can use a "B".
It can also hold floats and stuff like that.
We'll say "ar.fromlist" and we'll give it the ages, let's grab this and put that here say "array_size = size_util.get_full_size(ar)".
"array_size".
Perfect.
Now let's run it and look at that.
Wow.
So instead of using 800 kilobytes or something way more for whatever that is it's using 100.
So there you can see, way, way better.
106 kilobytes instead of 800.
So again, we're getting almost this 10x difference by choosing a slightly different data structure.
So that's pretty awesome.
If you go look at the documentation for array, you'll see that there's different types like we're using a "B", which is an unsigned character.
So maximum size in bytes is 1, so I think this is 0 to 256 probably, but if you have longer data, bigger data rather, like a float, you can say it's an "F", or you could say it's a "signed long".
But be aware that when you create one of these and you put something in there every entry that we had before was just one byte, and that was why it was really efficient.
Where as over here, if you say it's a float, right, it's gonna take 4 times as much memory, but of course it can hold the things that go in it.
So keep in mind, this is only for numbers.
We can't apply this cool trick to strings, but nonetheless we still got cool memory reduction accomplished, right?
So we could drop that one down by 700 KB at least and have names and then this type of array to store.
So if you're storing a lot of numbers, it's very likely that these arrays are going to help you a lot.
|
|
show
|
2:29 |
Alright, we tried them as arrays.
That was pretty cool.
How about as Pandas DataFrames?
That's a super useful type that we might want to use.
So let's go ahead and try to create some series here.
So I'll have a "names_series" what we're gonna do is create two series and they want to create the dataframe out of those two series.
I would like to say, "pd", but we don't have it installed yet, so I'm gonna go on add Pandas, and we're also gonna use NumPy so I'm gonna Just write them all in here, and then I just go ahead and let you see it.
Watch it install.
Okay, so we're gonna install these things because in order to work with them, they've got to be installed.
We haven't needed them yet, but here they go.
We come up here and say, "Import pandas as pd".
So we'll create a series, we'll give it names, the data type is going to be string, and then we'll have "ages_series", which comes from ages, and it's gonna be a "int8".
Then let's go create our DataFrame from that "pd.concat([names_series, ages_series] axis=1)".
And let's just first try to print "df.head()", see what we get there.
Alright, It looks like we've loaded them up correctly and it works.
So Pandas is awesome.
We could do all sorts of advanced stuff.
How awesome is it in memory, though?
Let's find out.
Let's just grab this, say DataFrame, and let's just do it in "size_util.
get_full_size(df)".
run it again.
How awesome is this gonna be?
ooooh, it's less awesome.
We have arrays and we have a big DataFrame.
The DataFrame is even bigger than the classes.
Not super surprising.
It's storing indexes and other kinds of things around it.
So it's super powerful, but it is not super efficient.
And this honestly surprised me just a tiny bit.
I thought, you know, it's gonna store numbers crazy efficiently and things like that.
But, yeah, not so much.
|
|
show
|
2:01 |
Last but not least, let's say NumPy arrays.
so NumPy, the foundation of Pandas as well, and much of the data science stuff, Let's see how it compares.
Now, NumPy is really for numbers.
It might be able to store strings, but I don't know if we're gonna get any advantage.
So I'm just gonna, like the array, just have a NumPy ages and go with that.
So we'll say "np" and of course we gotta do our trick up here as well.
Then we'll have an array.
It's gonna be from ages, and the "dtype = int8" as well.
And let's just print out one more time.
Let's first print "np_ages" just to make sure we got what we're expecting.
Yeah, there's a bunch of ages local.
Now let's print out the size.
Here goes.
How's NumPy treating us?
It's doing better than Pandas DataFrames, but you know, it's certainly nowhere near this.
So quite a powerful library here, but also not as efficient.
Anyway, I think that probably is a good place to call it done.
All the things we've compared, I think that's enough.
And the goal really is not even to tell you "these are better, these are worse".
The goal here is to give you a sense of scale, right?
If I store it as a list relative to storing it in, say, a list of classes, how much bigger does it get?
If I store it in a number like one of these typed arrays versus stored as a pandas DataFrame, how much bigger does that get?
And you can see, there's quite a bit of variability here.
So keep this, you know, add on the own data types that you want to play with.
Like, what if you're gonna store stuff in say a set instead of a list or in a dictionary instead of a list, you could have those kinds of things as well.
Here it is, some sizes of containers compared for a whole bunch of items.
|
|
show
|
2:00 |
Now you've already seen us use psutil, but I want to call on a specific capability that you can use while your program is running.
So we use psutil in our size utility to get the object graph closure size of the whole object, not just the base object.
And that's what we were just working with.
But psutil does all kinds of cool stuff and one of the things you might want to use it for is just asking "how much memory is my current process using right now?" If it gets over some level, you could take some action, right?
If it gets too high, let's clear out the caches and throw that away.
But instead of letting it crash or start to swap and become extremely slow, let's just say "clear the data that can be cleared and then carry on".
That doesn't always work.
You're not always in a situation where that's possible, and we're going talk about other options as well, Remember.
But if monitoring the amount of memory you're using over time is something want to do, well we've already been using the library, indirectly, that'll help you.
So this is a cross platform library for retrieving information on running processes and how much they're using of the system.
CPU, memory disk, all kinds of stuff.
So you can go crazy.
And it's really good for system monitoring and profiling, and it does things like ps and top and whatnot that you might know of from like the UNIX command monitoring tools.
The one you probably care most about, at least in the scenario that I've laid out for you, is "How do I go and get the current working memory in Bytes?" is what we do we get the OS, get the current running id of our process and we go to psutil, grab the process directly, and then we can ask for a memory info.
A bunch of stuff in there, and the one that we want to care about is, like the resident memory, so RSS, and print that out.
That comes out and bytes.
How many bytes your current processes using.
Now the OS can move this around and like it can be shared and all kinds of funky stuff, but it'll at least give you a sense of how much memory you're using.
|
|
|
41:20 |
|
show
|
0:38 |
It's time to start looking at core language features and patterns that you can use to make your code so much better.
What we've seen so far is we've explored how the code works and we've seen how to create data structures that kind of go with the grain rather than against it.
But now we're gonna talk about ways you can write code that can dramatically change the amount of memory that your program is using.
In fact, in this chapter, we're going to look at the biggest single improvement that you can possibly do for working with pipelines of data and stuff like that.
So working with functions and memory, there's a lot to be gained here.
|
|
show
|
2:51 |
Let's kick off this chapter by just diving right into some code.
So back to our demos.
And over here in chapter seven, we're gonna work on something about variable names, something incredibly simple that's going to dramatically change the amount memory we're using.
We'll call it "clingy_functions", and the reason we're calling that is because I want to give a little bit of credit to the guy who wrote an article that talked about this pattern first.
Itamar wrote an article about functions clinging to memory, how Python function calls can increase your memory usage.
You can read it over here and you can listen actually to the interview I did with him on Talk Python right there.
Now, this is not an exact copy of what he did, but it's highly inspired.
So thanks for that.
So we'll start out with our fmain, as always.
Now we're going to do three things here.
We're gonna come over and we're gonna have some data.
We're gonna call the data we start with "original", we're gonna say "load_data" like that, and then we're gonna have some scaled data.
We're going to take it and reprocess it, Maybe normalize it or something like that.
And we'll say "scale_data", pass in the original data.
We'll pass in this list or something like that, and it's gonna generate a new list with updated pieces of information.
Okay, and It's not gonna change it, It's gonna create a new one and we're gonna store it in "scaled", and what do we want to do?
Maybe my favorite number approximately right, 2.718, e of course.
And then we'll have "filtered".
We don't necessarily want to keep all of these.
Actually, I want to do, I'll do filtered first.
We'll do "filter_data(original)", and this will be "filtered".
And finally, we'll just print out "Done!" Okay, so what we're gonna need to do is we're going to need to write the three functions: load_data, filter_data, and scale_data, but the general idea is that we're doing this like pipeline of processing.
We're gonna load up some information and then we're gonna feed it on to the next step, which is to filter it.
We only want to look at some of that data.
We're going to take the result of that step of this pipeline and feed it under the next one and scale it.
And this can go on and on and on.
But, you know, I think three is going to be more than enough to show what's happening here.
So this is gonna be a really interesting use case, and you'll see that we'll be able to dramatically change the memory usage here by just making very small changes to this code in ways that you would probably get criticized for if you were optimizing for readability.
But if you're not optimizing for readability, but actually so the thing will run on the amount of hardware you have because you're running out of memory or something like that, all of a sudden, it feels pretty clever.
So that's what we're gonna do in this series of demos.
|
|
show
|
2:54 |
Okay, so let's implement these, and we're going to start out, as you can imagine, using some random data as we have been.
So we're gonna "import random" and we're going to seed it so you always get exactly the same result.
Pick some arbitrary number out of thin air and that'll fix it so it doesn't generate potentially different sizes of data, or different numbers which result in different sizes say in the filter step.
Alright, so let's go and do "load_data".
This is going to return a list which needs importing that type of int.
In the new version of Python, 3.9 I believe, you'll be able to say "list" like that and you don't have to import a thing, but currently in 3.8 and below you got to do this.
Let's go write some code to generate one million numbers between 1000 and 10,000.
So we can do that in a cool little list comprehension like this, we'll say we want to get random .randint, Between 10,000, I'll use a little, or 1000, and 10,000, I'll use a little digit grouping thing you could do here for nothing in range of 1 to 1,000,000.
Okay, so that should return us our one million items in that range.
That's pretty easy, right?
And then the next one, what we gotta do is we gotta filter_data.
This one is going to take some data, which is a list of int, and it's going to return a list of int as well.
So this one could be another cool list comprehension.
These don't all have to be list comprehensions in order to achieve what we're going to go for, like the technique we're showing, but they just happen to be nice.
So we'll say "n for n in data if n is not divisible, not divisible by 5".
Okay, so give us all the numbers that are not divisible by five.
Final one is gonna be to scale data, and this is going to take a data which is a list of int and return a list of float.
And we'll say something much like this.
So this one takes the data and the factor, which is a float, and so this is gonna be n times factor for n in data.
And that's it!
So we've implemented these three things and let's just, you know, print out some of the scaled numbers.
Let's print out the first 20, see what we get.
Just to make sure things are hanging together.
It takes a moment to run because we generated a million things and then did a bunch of processing on it, but those look like numbers that were, you know, started out between 1000 and 10,000, and then got multiplied by 2.8, don't they?
Perfect.
I guess the other thing we could do is also we could just print the length of scaled so we know how many we actually got back, about 800,000.
|
|
show
|
4:09 |
Our code works, actually it works really well.
I see no problem with it.
It's gonna go through and process this data in this pipeline that we talked about.
But let's see if we can understand it's memory usage and then we can explore to see if we can make it better.
Now I want to change one thing really quick here.
Just because this is going to uncover some of the challenges that we might see.
Let's say, "head" just like this and this will be the "tail" right?
Kind of beginning and the end of this data.
And let's just put five of them in there and let's go from -5 to the end.
Why is this gonna be interesting?
Because some of the advancements that we'll be able to do later will have a harder time looking at the end.
So if you reasonably want to go through the data and, like, ask questions like "what are the last five?" you'll see both the benefits and the drawbacks.
So this will give us kind of a realistic constraint as we go.
Alright, so what do we want to do?
Well, first of what we want to do is we want to go and bring in our little size utility again.
So the size utility, this is what we've used so far, but it also has this thing called "report process memory", which will return the number of megabytes used, as well do a little print statement.
So we're gonna use that and in order to use it without PyCharm, right?
Again, we can Just "import size_ util" which is fine here, but without PyCharm, we've got to make sure that it's getting found in the path here.
Sort of annoying, but so it is.
So this is going to make sure you can always run it, even if you just go and run it from the terminal or command prompt right in that one folder.
Let's go and do, to find a little function here, that's called "report", and it's going to take a "step_name", which is a string.
Alright, so it's just gonna go to the size_util and say "report process memory" but we also want to just show a little bit of information about that so that you can see at this step it's doing such and such rather than just having the numbers come out three times.
So we'll do a little print with an f-string here.
We'll say "step is this".
And then we also want to set the end to just be a little space.
Actually got two spaces there.
So we're gonna say step name like "loading data" or "filtering" or whatever, and then we'll see the print statement from before, okay?
So let's go ahead and use it here.
Now I'm gonna write some funky code.
Ah..
I'll write it Like this.
Let's say "report original loaded".
I'll just say "loaded".
Over here, "report starting" right?
So we have the baseline of what we got and "filtered" and "scaled".
Then we do "done", I guess.
Alright, let's run it and see how it works.
We started with 9 megabytes used then we loaded that one million numbers that went from 9 to 48 million.
And then when we filtered it, it went up some more, then we scaled it went up some more.
So at the end, we were at 92.48 So how much did our little bit of code take to run?
It took 83 megabytes.
As we go and improve it over this what I'm gonna call "naive mode" or just the most straightforward mode, you'll see that maybe we could do better in this.
That's the number that we gotta beat if it's going to be better.
Now, I did say I was gonna write a little bit of funky code.
And what I wanted to write, Maybe I'll actually put it here, is this, because, you don't normally see semi-colons in Python, but check this out.
So what I didn't want to do is I kind of wanted to just keep the same little pattern, but make it report at the end of each step.
So I'm gonna go and leave it like this, right?
The report is just here for us to kind of know what's going on.
I don't really want to change this pattern.
I just want to look really simple and clear and just these three lines, these three lines are actually the problem and where the solution lies.
|
|
show
|
5:19 |
Let's look at this code and think about how we can make it better.
We'll see that we could make the smallest little change and actually dramatically improve the memory usage.
But first, I just really quickly recorded how much memory is used at the beginning, in the end, and have the print out now show exactly.
So when I run this, you'll see it'll say, used basically 83 MB.
Great.
So what I want to do is I'm gonna keep this one around for you.
I'll call this one, go up here and call this "greedy_main" something like that because this one uses extra memory and this one here, we'll just leave this as "main".
It's gonna be our improved version.
So thinking about this, why is so much memory being used?
You can see it grow each time when we run it.
It's using first 48, and then 60, and then 90.
And actually, what's happening is, at least from this step, from going from original to filtered, we're using less data, right?
We're dropping out 20% of the data from a million to 799,000.
Should use less, not more, right?
Well, when you think about how reference counting works, when do these variables go away?
The variables go away when they're no longer defined in the function, right?
There's no optimization that says, Well, "original is not used after this line so we can clean it up".
Some things have that like .NET and it's JIT compiler in production will clean those variables up as soon as possible, but in debug, it'll keep it around.
So, like if you said a break point here, you can see it.
But Python doesn't make those types of optimizations.
What it does is as long as this function is running, the variables defined in it still exist.
So that means original filtered still has a reference to it and won't be cleaned up even though clearly, you know, original is not needed after line 39, filtered is not needed after line 40.
How do we fix it?
Well, one not as beautiful way, but really easy way to fix this is to just reuse the variable.
What if we just call this data?
Here's the data in the current, the current data in the pipeline.
And that goes here.
And now here's the current data in the pipeline, and we're gonna pass that along.
And now he's a current date in the pipeline, and then we're going to work with the data at the step it's at.
Now, this is not as nice, right?
If I read this code, you know which data from which step am I working on?
Somebody doing code review like this might say, Well, "this variable means three different things along the way", and that's really crummy, because here you had original filter and scale that doesn't need as much documentation or as many comments to understand what's happening.
But here's the thing, this reference here, when you go and set the next line like this, it replaces it and dropped the reference to what was called "original".
This line is going to drop the reference to what was called "filtered" and so on.
So we shouldn't be holding on to those from step to step to step.
Let's just run it again and see what this, like literally five word change means for memory.
How cool is that?
So we've come along and we've started our nine again.
This is the same.
But then notice this step up to 59 was less, and then 79, or 78, T guess 79 if you rounded it up, and then we get the data back.
So this is 78, the final, which is 69.
And what did we have before?
We had, not that many "o's", call this "single variable mode" or something like that, Right?
So we've saved, not a huge amount, but we've saved a non-trivial amount of memory by just using a different variable name.
How cool is that?
So I think that's a pretty big deal.
The more data that we load, like if this was 10 million or larger, it would make a bigger difference.
If we had more steps, this technique would make a bigger difference, right?
It's how much cumulatively did you have to like hang on to as you went along?
I think because we're converting from maybe ints to floats here, probably this last step, it takes the most memory.
So if we started with floats or something like that, we could probably see a bigger difference.
But very cool.
We were able to basically delete original and delete filtered and just keep what we had called "scaled" here to work with, and that was it.
I think that's super cool.
I guess a parting comment is if I was writing this code, you know, I would have something, some kind of comment here that is like using single variable name you ensure data cleaned up as fast as possible.
I don't know, Something like this.
I'm not generally a big fan of code comments, because usually it means your code is not clear, but here we made it unclear on purpose.
It's worth while to reduce that amount of memory, definitely in this case and some in real cases, right?
this could be huge.
What we're going to see later is that we could actually do much, much better than this.
But there will be a performance trade-off to a small degree, right?
So here's one variation on trying to take this like pipeline of data processing and make it much more efficient by not holding onto the intermediate steps.
We do that by having a single variable name that were just reusing over and over.
|
|
show
|
1:32 |
We saw with two very similar patterns, we got quite different memory usage.
Here we're saying we're gonna have three steps and we've named the data from them: original, scaled, and filtered.
And we could change this to just say "we're going to call the current data were working with, data".
Now I don't love this, right.
It's not as clear, it's not as obvious what's happening.
But at the same time, if it dramatically lowers the memory usage and lets you do more data analysis or run a lot cheaper because you can use smaller servers when you pay for your cloud computing, all that, well, it's probably worth a little bit of less clear code.
So I actually graphed this in a tool that we're going to see a little bit later.
And if you look at the clean code version, that one actually had 105 megabytes.
It's slightly larger because we scaled and then filtered, which means, when you're working with floats, more steps.
But nonetheless, it's basically the same idea.
So we got 105 MB over there, and this simple change dropped it down by 34 megabytes.
Just those three lines made such a big difference.
Very similar patterns, but actually forcing Python to drop the intermediate data as soon as it can.
There's of course other ways.
You could set the original to none, and you could set scaled to none after you're done working with them.
That would be fine, too.
It's just gonna be more lines of code.
So you can work with it however you like, but I think this is a pretty interesting pattern to think about to force Python to clean up the intermediate data sooner.
|
|
show
|
10:31 |
Well, if this switch to reusing the variable name seemed impressive to you, what we're going to do now will probably blow you away.
So let's go over here and I'm gonna make a totally new copy of this.
I'm gonna call this "app_one_at_a_time".
We're going to switch this out to use a totally different style, and we'll leave the seed the same, we're still gonna import this, we're still going to need that and so on.
But instead of using this greedy_main, I'm just gonna have the regular main here and we're actually...
And no let's actually go back to the greedy_main.
That's even more interesting, right?
This was the worst case scenario.
So we're going to get rid of this one.
Super.
So what we're gonna do is we're going to rewrite, reimplement these functions so that they use less memory.
And let's go ahead and just run this and see where we are.
Looks like we're using 83 megabytes of memory additionally, to run that code.
Can we do better?
Well, the answer is yes.
The mechanism for doing so will really impress you, I think.
So, let's come along here and Let's change this.
Now I'm gonna change this in two ways.
These are all three list comprehensions and converting list comprehensions to where we're going is really, really easy.
But I want to rewrite one in a way that makes it more obvious of what the general solution is.
So let's first do the easy piece.
Let's say this one, instead of creating the list, filling up the list with all the items, keeping everyone of them in memory at a time and then processing them later, what I'd like to do is set up this function in a way so that it can give one back, let that entirely be processed through the pipeline and then offer up the next, and the next, and the next.
So instead of loading a million things, we're gonna load one, let it get taken care of and then load the next.
How do we accomplish this?
It seems like there's some interesting or tricky coordination we're gonna have to do from here to make that happen.
It turns out it's mind-blowingly simple.
So see these square braces?
When you have square braces, this generates a list that is fully realized into memory.
If we change those from square braces to parentheses, and nothing else, this will now generate a generator, and the generator gives up one item at a time and does not load them all into memory.
And because range is a generator, we're pulling on the generator.
As we pull on this generator from load_data, it pulls on range, so nothing extra is going to be loaded.
You see this warning as it's no longer a list were returning.
We're gonna return a generator, like so, if we import it.
Alright, well, that's pretty cool.
What about this one?
This "filter_data"?
Let's say the filter_data we're going to do in a more interesting way.
Let's make this an "Iterator", make a little more general, I suppose.
And then over here, we're gonna say, this is not now taking, we'll go up, You'll see, now this has got a warning, so this is now gonna be something that takes an iterator, which would still work for a list, and it's going to return an iterator, and let's just keep flowing this along.
This is cool.
What are we gonna do with this one?
This is the one we're going to rewrite.
So let's do this "scale_data" one.
And our fancy conversion, we'll just put parentheses instead of this, and let's see if it works.
It probably will run.
Yes, this is that little part I threw in to make things interesting.
So we're going to, just, we're going to have to deal with this in just a second.
But let's throw it away for a minute.
And how are we doing?
We used 31 MB's.
That's less than half of what the better version we did.
Oh, but it's going to get a whole lot better because this filter one in the middle is actually creating a list.
Let's see what we can do around that one.
Well, again, we could put just curly braces there, or parentheses, I'll show you the general solution.
So, we can create these generator expressions, but to create a proper generator, it's really simple.
In Python you just use the yield keyword.
So we'll say "for n in data", we're gonna do an if, "if n mod 5 is not equal to zero", then we want to say "here's one of the items".
And the way you say that is use the yield keyword and then just the item.
There, that's it.
Let's try again, Whoohoo!
Zero megabytes used!
Zero!
Now, for those of you who are aware of what generators do, you'll realize we haven't actually done any work by pulling on them, right?
So if we didn't do this, we haven't made anything happen.
Now doing the slice, though, this is a little bit tricky.
We need to figure out a way to get this data back, right?
The reason I left these in here is because we want to have this constraint, this somewhat realistic constraint of like dealing with the last bit of data or something like that.
If you're just gonna pass them along and process them one at a time, it would look something like this "for s in scaled", we could print, I guess we won't print it, but we could say "z equals s times s".
Just do something with it so it goes through all of them.
Let's see what that does for the memory.
Oh, my, goodness!
9 megabytes!
We started at 9, we stayed at 9.
We actually added it a little bit, but it's like you know what?
Really?
We haven't used any memory.
We've used less than one megabyte, right?
Because we're showing it 2 zero decimal places.
We've used 200 kilobytes.
That is insane.
That is insane!
If we take what we added before, which was 63 in the best case, or 90, and it was 83 in the worst case, divided by 2 kilobytes, that's a 415 time improvement.
How awesome is that?
That is so amazing.
And what did we have to do?
Well, we put parentheses rather than square brackets.
And here we use the yield keyword just to show you the general solution, but we could have put parentheses instead of square brackets.
However, certain things we thought we wanted to do, like this, turned out to be tricky because you can't slice them.
You should really be able to, right.
Like you could interpret that as like "give me these in this range" or whatever, but it doesn't work that way.
So What we can do is we can come over here and use this thing called an "islice".
So I'll say "islice", and it itself is a generator, so it's not going to realize itself for printing unless we throw in a list.
but this comes from itertools, and we say the thing that we would like to slice from 0 to 10, and let's give that a shot.
Here's the head, and that's exactly what we had before.
And we're still using zero bytes.
Zero bytes at our 200 K.
Now it gets a little more tricky to do this tail.
I could be misunderstanding or not finding the right library to give us the tail, but I don't know how to get it.
So I'm just going to do a little loop here, right?
So we're gonna say "for n in scaled".
Now, you got to be careful.
We've kind of used up, we've consumed the first 10 and as you go through these generators, it doesn't like reset to the beginning.
So this only really works if there's more than 20, otherwise you would just store them in a list.
Alright, we're dealing with tons of data, but anyway, that's what we're doing.
So we're going through here and we're saying "tail.append(n)", and that would add all of them into the list.
And you know, it's not horrible, what's gonna happen with the memory, it's a little bit slow, but we've only still use 31 MB's instead of 60 or 83 or whatever.
But we only want the last 10.
So we'll say "if length of tail is greater than 10, we're gonna throw away the one on the front and let it move towards the back".
So we'll just say "pop zero" and then we go.
Takes a tiny moment, but like zero megabytes again and now we can get our tail back.
Let's just say "tail".
See, it takes a moment to go through it, but that's because we never actually process.
We never went through that iteration until we tried to get to the end.
If you compare these numbers against the ones we had before, they're the same numbers.
This is the same result.
What did we have before?
We went from 9 MB's at the beginning, up to like 100.
In the good case, it was 80 or something like that.
It did not move.
It finishes at 9 megabytes.
So this pattern, this ability to use generators in a pipeline to say we're not actually gonna load all the originals, we're gonna load one and we're going to go back and we're gonna pass it to here, we're gonna pass it to here.
So when we try to loop over this, like right here, or right here, we pull one out of scaled, scaled reaches into it, generator has been given, and it pulls one out of here.
This may pull a couple from original because it could be skipping, all the while transforming it in the scaled, and then that's gonna pull on this generator, which is gonna pull on the number which generates the random coming out of the range.
So it's like one generator pulls on the next and pulls on the next.
So we're never really loading more than one or two things into the memory at a time.
It doesn't matter if it's one or a million like it is in this case, we use basically zero memory to make that happen.
This is not something that always works.
But if you have a use case where a generator makes sense, use it.
Look how awesome this is.
The scenario where it doesn't necessarily make sense is if you want to create the data and then go over and over and index into it, and pass it around and use it again.
Remember, generators get used up.
But we could always just do like this where we go through the whole pipeline and then realize the last bit into a list.
We saw that still more than 50% improvement and it still gives us that, like, in memory list work with the data style.
So we have a bunch of options.
So we could do this, sort of, realize it as a list, but only at the end and not the intermediate data.
And check that out.
We end up using our same design patterns, the same way of working with stuff, and we use basically zero extra megabytes.
So, this is such a cool pattern if you could make it work.
|
|
show
|
1:48 |
We saw how incredibly easy it was to convert these functions that turned everything to a list all at once and then handed them around over to generators.
Square brackets, parentheses.
Done.
Now again, the consumption side, how you use it, you have to be more aware of only going through that list once, but if that's the use case you have anyway, this is an incredibly easy way to do it.
We could also go and be more explicit for scenarios where it's not just a list comprehension but you have more interesting code.
Like this one we converted using the yield keyword, and surprisingly, it actually got shorter even though it sort of had more syntax that we had to do.
And it just blew the other patterns we had out of the water.
So this one, the memory growth in the example I had originally was 102 megabytes, and this one is zero.
Zero!
Incredibly, it's because we only loaded like a handful of numbers at a time, and that's it, and nothing else.
It's so awesome.
This one, on the other hand, I timed it outside of, I just threw some timing on the demo so I'll check that into the source control, This one takes about 1.11 seconds on my computer while I'm doing screen recording, all that kind of stuff.
This one takes a little bit longer, 1.77 seconds.
So your paying a little bit of computational time for in 400X improvement in memory, 415 times better memory storing.
That's not always worthwhile.
Maybe you have more memory, and it's just about getting things done quicker.
But if this is the case, if you're under memory pressure, this is a fantastic pattern.
You should absolutely use these in data pipeline type of scenarios.
It's really, really powerful.
|
|
show
|
5:57 |
Let's look at another example, and this time we're gonna talk about an aspect of Python that is super cool, is not used that often, but is probably often misunderstood, and that is closures.
So let's go over here and create an app.
We'll call it "useful_closures".
We're gonna have two types of closures.
This one is like the most standard normal thing I can think of.
So we'll have a fmain thing going on, and what I want to do is I want to create a person class.
Instead of using the one we had before, I'm just going to create a named tuple.
So we'll say "person equals namedtuple", and you got to say the name of the type, should probably be the same.
And then the things that it has, it has a name and an age.
So that's like defining a class but less effort, probably less memory as well.
I'm not sure.
So let's go down here and create some people.
And here we'll say it's a person, and PyCharm is awesome, it knows it takes a name and an age, that's pretty cool.
So let's say "Sarah, 28".
Let's throw Lilly in here, she's 24.
Throw in a guy named Matt, he's 32.
Zoe, she's 40.
And Jack, young Jack, he's just 10.
OK, so these are our people.
What I want to do, I want to show them.
So, print out the people and let's go ahead and tell this it's gonna run.
And look, we've got Sarah, and then Lilly, and then Matt, and you know, guess what?
That's the order in which they were put there.
But what if we sort them?
Where they're lists, lists can be sorted.
That's pretty awesome.
How is it sorted?
Hmmm.
It is gonna be sorted alphabetically.
First, its by name, and then it's going to sort by age.
So if we had two Jakes, right?
a Jake that is 50, and a Jake that is 10.
Where are our Jakes?
See, it put the younger ones, so where there's a match on the name it actually goes and then looks at the second thing in the tuple.
That's actually killer.
That's really, really cool.
But what if I don't want to do that?
What if I want to sort just by age?
Well, then we're gonna sort it in a different way.
We're going to do something really cool.
We're gonna use a lambda expression.
so we'll say "print, default sort".
And let's do a better sort.
So we'll say "print, sort by age", so let's do the same thing, "people.sort", but notice now there's this key, and the key is a function.
We can give it a regular function or a lambda expression, and it gives you one of the items in here.
And then what you're supposed to do is return the thing you're gonna sort by, that passes it to the underlying quick sort, and then it sorts the real object.
So we could just say "p.age".
And here we'll say "print, people".
So here we sort by age.
We have Jack, and then Lily, and then Sarah, check that out.
If we wanted to have reversed, we could put a negative here, or we could say reversed, Right?
But you can control just by like what you pass back here.
Now, Old Jake is first, and then Zoe and so on.
So this is cool.
We use it all the time in Python to do neat stuff.
But what if you need more information?
What if you want to sort by some criteria?
Like, for younger people, I want to sort them from youngest to oldest, but older people, I want to sort them from oldest to youngest.
Something like that, alright?
How could you accomplish that over here?
Sort by age.
Grouped, I guess we'll call it?
I don't know.
Where do we pass more information?
So the p, the p comes from sort passing.
When it calls the key function, it passes one of the items and says, "give me the key", given one of the items in the list.
And then it's gonna use the sort, right?
Where's our additional information?
So what if we wanted to say, Oh, there's a cutoff, and if you're over 30, we're going to show the oldest people first, sort descending.
If you're gonna be under 30, we're gonna show the newest, so sort ascending.
What we can do is, instead of passing a function, a variable to this function, we can capture it through a closure.
Okay, so what we can say is "we want this, if p.age is greater than cutoff.
else, we're just going to sort ascending with p.age".
So if you're over 30, we're gonna sort you descending, if you're under 30, we're gonna sort you ascending.
How is this variable passed to this function when it gets called?
You can see PyCharm thinks they're the same.
It's highlighting it.
If we run it, we should be getting what we're looking for.
Let me do a little separator.
Over here, we have for the old people, Jake, and then Zoe, and then Matt.
And then it flips.
For the younger people, it's Jake, young Jake, and then Lily, and then Sarah.
So the younger people are sorted ascending.
The older people are sorted descending.
This is completely confusing and you would never do it as a way of organizing your data, probably.
The point is to show that if you wanted to do something non-standard, something extra, when provided additional information, you can take this variable, pass it off over there, and then call that function, right?
The only parameters being passed to this function are p, but it actually is working with two parts of data: p and cutoff.
That's awesome.
So this concept here is a closure.
And this ability to have functions like reach around them and grab onto these is really interesting.
It's practical in cases like this where I just need a little extra information to decide something, and then off it goes.
We'll see that this can lead to problems with memory.
But for now, let's just take a step back and see how cool it is that we can pass along these extra pieces of data through a side channel called "closures".
|
|
show
|
4:30 |
Well, that was pretty cool, but let me show you how far this idea of closures goes.
There's something we can do that probably will be surprising if you haven't seen it.
So this is gonna be "counting_closure".
Start like this with our fmain as always.
Normally, what happens if you want to have some function and some data state that's tied to it, you would create a class, right?
Classes have functions, they have fields and then they can create an instance of them.
You stick them together and they go around for the rest of their life doing that.
But let me show you that closures for a limited scenario, basically, where there's one function but multiple pieces of data, potentially, you can go around and do this same thing.
So I'm gonna write a function that creates another function, it creates a closure.
So, I'm gonna say "def create_counter", and this function is going to return another function.
whose job is when you call it, It's going to basically increment a counter by start from some position and have some step size and so on.
So I'll start, which is an int, step, which is an int, and it's going to return a callable over here.
A thing you can call, basically a function.
Check this out.
We're gonna go and say the current value is going to be start minus step.
Why did we do this minus?
Because the first thing we're gonna do when we define our function inside, check this is out, it's not on the outer scope.
It's going to be the counter implementation.
It takes no arguments, and yet it works with data.
So what it's gonna do is it's gonna say "current += step" and it's gonna return current.
And because the first thing it does is increment it, we want to go back one so we start at the start, and then as soon as we increment it, it goes step one back to where it was and then step, then step, then step.
Now, in order to tell it you want to use this variable as a closure capture thing, you have to say "nonlocal current".
Now the warnings go away and what we're gonna do, we're gonna return counter implementation without calling it.
Well, that looks kind of interesting, But watch this.
I'll say, "create_counter" and let's say this one starts at 7 and it steps by 3.
Then we're gonna print out calling it c1 like this.
It's a function, it's a callable, so we do like this.
And let's call it a couple of times.
What's gonna happen?
Well, the first time we've given it 7 and said to step by 3.
So hopefully it's gonna give 7.
But then what about the rest of these?
Is it going to remember where it was?
What is it going to do?
Let's just go and run this and find out.
Check that out.
7, 10, 13.
We could create another one of these.
We could create a c2, which is create a counter, it starts at zero and goes by 10.
And let's just part way through call c2 a couple times and I'll just say, Make it a little obvious that these are the number 2's coming over.
Check this out.
So it goes 7, 10 and this one just carries on, 13, 16, and this other one I made also kind of seems isolated.
0, 10, if we called it a couple more times, it would count up 10, 20, 30.
They have captured these variables and they hold onto them forever.
As long as c1 is around, it's intermediate variables, like current and whatnot, those things now have references pointing back to them.
They're not going to get cleaned up.
Even though this function returned, and normally that would clean up this variable, It doesn't now.
So, the main take away from this section is that these functions can capture variables as a closure, and when they do, those captured variables are no longer released until the function goes away.
And if, you know, sometimes these functions would turn out to be top level functions that never go away.
So in some sense, these could be like memory leaks.
That may be okay, that may not be okay, but you need to be aware that when you capture these things through closure, you're going to end up with possibly holding on to that data longer.
As you think about what things were getting captured, you might want to think what variables do I really need here?
Are there times where I want to make sure they get cleaned up or something?
So this sort of takes you outside the bounds of normal cleanup for the variables and pushes it back to the life cycle, the actual function that was created as part of the closure.
|
|
show
|
1:11 |
Let's round out this chapter on functions by looking at closures and closure state.
So we saw that we could almost created a little object, a little class that has some behavior and has some data by using closures.
So our "create_counter" function here, take some arguments to get things started.
It's going to create a variable called "current", and it's going to capture that using "nonlocal".
It's also going to capture "step" by the way, we just don't have to say nonlocal for it.
So this counter implementation that gets returned will have a permanent reference to current and a permanent reference to step.
This is like creating a class that has one function, which is whatever this does here, do the step, and then two fields, current and step.
And when we create one, here this one, we're starting a 10 and incriminating by 7, so we're calling it "by_7s".
When we call this function, first time it used it's initial variables that it's held onto, but then the next time, it's remembering Its state.
It's holding current and step and putting them together, and again, and again, and again.
So when you create a closure like this, just keep in mind that you're changing the life cycle of current and step.
|
|
|
36:30 |
|
show
|
0:49 |
In this chapter, we're going to focus on Python classes and how they consume memory, and in particular some techniques that we can apply to make them more efficient.
And you'll see some of those techniques, even, I'd say most these techniques, will make it not just more memory efficient, but also CPU speed faster.
So things happen quicker just on a pure computational side as well as it's gonna use less memory.
That's like a double win, isn't it?
Everything we've learned so far about efficient data structures about functions and whatnot can layer on top of what we're going to talk about here, right?
Functions can handle classes that can return them, internally the fields can store things in arrays instead of lists or those sorts of things.
So think of this as just more goodness and more places you could make your program more efficient and better.
So, let's dive in to classes.
|
|
show
|
6:50 |
Alright, let's jump back in and just start writing some code.
Go back to our project we're working on, and we're gonna add another app, this is gonna be "app_props_and_classes".
We'll give that the fmain magic and set it up to run.
Alright, So I would like to be able to create a class.
We're going to keep working with this person idea, that applies to all of them, but we're just gonna keep rolling with this idea of a person.
But I would like you to write a class that, given that class, it's going to behave like this.
We'll be able to come over here and have a datetime, which is the birthday.
A monthly income, which is a number, and we'll create a person, Sarah Sampson, given the birthday and the monthly income, and then it will have her full name, which will be Sarah Sampson, Like that, from her two name parts.
How old she is in years based on right now, her yearly income, how long until she retires and so on.
So we're gonna do this three different times.
The first one we're going to create what I'll call the "naive person object", and that is just write this in the most straightforward way that you would with Python classes.
And then we're gonna make one improvement and another.
So, I'm gonna create a little place to store those three classes over here.
I'll call this "props_people".
I wouldn't normally organize this, I'd just call it person or something.
We've already got several persons rolling around.
And we're gonna have a naive, let's say "PersonNaive" like this, and let's just remind ourselves what that was that we're going after.
We'll come over here and we'll create an initializer, a constructor, and it's gonna have to take first name, last name, birthdate and monthly income.
Those are the things that are passing in.
This one's gonna be an integer.
This will be a datetime, datetime, like so, and strings and strings.
Then let's just have PyCharm Add the fields to the class.
So thank you pyCharm for doing that.
I do wish that it would not do it in reverse order that it would put them behind, not ahead.
But you know, So it goes.
We can always reorder it.
Like I feel like that should go up and those should probably go above, I don't know why.
So, so far that gets us down here.
And now we're gonna have a full name and we'd like to be able to say "object.full_name", so let's put that here.
We'll say "self.full_name", we'll just make an f-string, "self.first_name" we're not gonna need the "slef" at this point.
First, last.
Alright, that's gonna be the full name.
Ah, then we'll take care of this one.
We need age in years.
Well, what we need to know about age in years we're gonna need to hold on to now.
So we'll say "datetime.datetime.now()", call that function, and now is gonna be the time it is now.
So, age in years, let's see, we gotta go from their birthday over to this.
So we'll say, I'll make an "age_delta" is gonna be now minus birthdate, and this is gonna give us a time delta, which represents basically the seconds or the ticks.
And there's a cool thing we could do with time deltas, we can divide them, and I can come over here and say "days equals 365" and that's gonna take however long that is and convert it to however many years.
So over here we'll say, maybe we'll store it as an integer.
People think of themselves as having an integer age.
You know, I'm not 36.271111 I'm you know, whatever, you know, 36.
Or, sometimes you round up a little early, but you talk about that in integers.
So integers we're gonna put there.
Yearly income, again.
This will be "self.yearly_income", that's 12 times the monthly income.
That one was easy.
And finally, years until retirement.
So "self.years_to_retire" is gonna be max of zero, because you might already be up for retirement, or "self.age_in_years", let's say 65 minus that, because at least in the U.S., typically people retire at 65 on average.
So, how long until you're 65 years old?
Alright, I think that this might do it.
Let's go over here and we'll say "props_people import this one".
Now, notice This is a person.
We're gonna have different variations.
So here's the thing that is not normally done in programming, but I'm going to do here.
I'm gonna come over and redefine, have a person type, which is gonna be a person_naive at first.
And then we'll have a global person type, nut we can change this to affect other functions.
So in the beginning, we're going to start with this.
We could even say testing, or, "running with person implementation: {person.__name__}" if this was and f-string, like that.
And then what we want to do is we just want to run "retirement_summary()" right?
That's a lot of work, but it set the stage for this exploration.
Maybe I'll spell this correctly rather than PyCharm spell it correctly.
Let's go ahead and run and see what happened.
Yes!
It works!
so hi Sarah Sampson, let's review your retirement.
Your 46 years old.
You currently make $84,000 USD a year.
You have 19 until retirement.
Maybe units belong there.
I'll be sure to put some bank, some money in the bank and retire for retirement each month.
there you go, 19 years, we're gonna say one years, but that's fine.
I'm happy with that, it's okay.
Good enough for our purpose here.
So this type of definition of this object here seems totally natural, right?
It lets us go to our person, say person dot, we have age_in_years, last, first, full_name, all the things that we want to work with and notice these are "f's" so these are fields as opposed to functions or other types of things, this is great.
What could go wrong?
Well, it turns out it's not super efficient, even if it is the most natural way to do things.
|
|
show
|
1:46 |
If this was the entire program, or at least the entire usage of person, we're done.
There's nothing really we gotta worry about, but we're exploring memory and how that works.
So we want to make a bunch of these things, right?
So let's go and create a function called "create_a_crowd()" and it's gonna return a list of person.
I want to explore this in two ways.
First, how long does it take to actually create the crowd?
And then we're going to do it.
So first, let's just write like this, we're gonna return a list expression here, were gonna create a person, it doesn't really matter what values we put in here.
I'll just say first and last, and this will be "datetime.datetime.now()", and let's say they make $5000 a year.
Sure, there would be some variation, but honestly, it doesn't matter.
Then we'll just say "for whatever in range from 1 to 100,000" and let's go over here and say "crowd = create_a_crowd()".
You're gonna see that it runs.
It does.
Let's print out the length of crowd and the first item of crowd.
There we go.
We have, oh, we're off by one.
We have 99,999 and we have a "PersonNaive" object, even though, of course, we said person, currently the implementation of that is what it is.
So let me do a plus one, or we could more easily do it like this I guess, from zero.
There we go.
There's our 100,000.
Alright, we've created crowd, Now the next thing to ask is how much memory did we use and how long did it take?
|
|
show
|
2:05 |
So let's check how long this took to create this crowd and then how much memory was used to.
So let's go over here and say "t0 = datetime.
datetime.now()", and then we're gonna do this and then check.
So why not a little try finally, hey?
bring that in there.
And then we're going to say the change in time is as you would expect.
And then I'll just do a little print statement that says we created a crowd in however much time, convert this to milliseconds, so let's just see that.
Here we go.
It took 459 milliseconds to create our 100,000 size crowd.
The other thing that we want to check, though, is how big is the crowd?
So we'll just print out a little statement here, the size of the crowd, we've got to get our size utility, and just like before, we need to have this bit at the top, so in case you're not using PyCharm, things will still work.
So here we're going to say, "get the full size of our crowd".
Remember, this traverses the whole object graph and it's gonna print it out in megabytes.
So the "create_a_crowd" knows how long it takes, and then we're just going to say how big it is.
Created a crowd in 400 milliseconds, and the size of the crowd is 32 megabytes.
Alright, not bad at all.
This doesn't really mean a lot to me.
Like, What am I gonna compare this against?
Is 32 MB's good or bad?
I mean, it is 100,000 people with lots of information about them.
That's cool.
And 400 milliseconds, also not too bad.
So we're gonna use this as our baseline and we're gonna create two other versions of our person implementation that are gonna improve upon this and actually make it better, both the crowd size created or the crowd created in is gonna go down, and the crowd size is also gonna go down.
|
|
show
|
4:04 |
Let's think about some of these things that we're working with here.
So, like full_name, we're storing "Sarah space Sampson" in memory for this p object, and we're also storing Sarah, and we're storing Sampson.
Wouldn't it be nice if we didn't have to store all that extra info?
Same thing for age in years and yearly income and so on.
So we can go to our person object here, and we can change it a little bit.
Let's make a not so naive, call this "PersonEfficient" like that, and it's going to take the same inbound data, but instead of doing all of this, every one of these is gonna become a property.
You're probably familiar with the property, but they are functions that you put on the class but they actually, from the consumption side, look like fields.
So the important thing is this bit of code of nicely, just saying "p.full_name" will not change, but how we store this stuff will.
So let's just get cracking on here.
Say "self.full_name" and in PyCharm we type "prop" and hit tab, and it'll write much of this for you.
And then we want to go down here and just say "return" now we gotta say "self.first and self.last", and so on.
And then what's this next one we want?
Age in years?
Alright, here we go.
Prop, tab, boom!
Take all this.
Instead of storing it, we're going to return it and this will be a "self.birthdate".
Perfect.
We could even go to these and tell them that this is a string, this is an int, so that when we work with it, it's gonna be a little bit nicer.
What do we need, yearly income?
Again, prop, tab, paste, and we come down here and we just return that, and we need a self.
Final one, years to retire.
And there we have it.
Let's do a little cleanup.
So these things were only going to do that computation when somebody asked for the years to retire or when they ask age in years or if they ask for the full name.
If you don't ask for that, we don't even run the code to create it.
We also don't have to worry about the memory.
Now, on the flip side, if this is something you would create one and then ask for this 100 times you might create some kind of, like, cached memory of, like, have I computed age in years?
yes or no.
If no, first compute it and then return it if it was expensive.
So there's always room for some kind of balance.
But let's try, use this over here and see how that comes out.
Do exactly the same thing, but this time I'm gonna say person equals this thing, which We gotta import.
Again, this is really weird, you would not normally do that.
But in this case, for a little demo, we could just, you know, swap out the type, and have exactly the same code run.
You think it's gonna be better?
Think it's gonna be faster?
Let's find out.
So we did it up here, and it was, that number was 427 milliseconds and 32 megabytes!
Look at that!
It's much faster.
basically three times faster, and it used a third less memory.
Wow wow wow.
So that is pretty awesome.
And as you can see right here, the programming model is still the same.
Hey, Sarah Sampson, let's review your retirement.
Your 46 years old.
You currently make this much a year and all of these are the properties here, Okay?
So, super, super cool and easy change.
You're probably familiar with properties, but I just want to make sure that you linked together this idea of the property, which is this computer thing often used for validation.
If you're going to set something, you'll check that the variable gets set correctly, or you could use it for, like I talked about, cacheing only computed on demand, but then store it, and then so on.
So there's a lot of functionality behaviors that are interesting here.
But there's also big memory implications, because now we're only storing this data, and all of this is just on demand.
|
|
show
|
1:02 |
so we saw that many times classes can contain computed fields.
They've got base data, or primary data, in this case a and b, but then there might be some other fields we want to provide to our users that could be computed, or are derived from, that base data.
Here, like c is a over B, and d is a times b.
Somewhat contrived, but you can see some of this could always like, c and d could always be recreated from a and b, but it's handy.
It's easy to just make these all fields so people don't have to call functions, they can just say a "thing.d" and it's all good.
But if we switch this over to properties, from the outside, the consumption model feels identical.
But from the memory side, we're now using half as much memory, or maybe more, depending on what kind of objects we were storing there.
So this is really nice and really easy.
It also provides slots or spaces in the code to do things like cacheing like only computed on demand but then don't recompute it and all kinds of interesting stuff along those lines.
|
|
show
|
7:53 |
Let's do another little exploration here.
To understand why classes take as much memory as they do, we need to figure out where they store their data.
and it's also setting the stage for a very big improvement that we're gonna make in a minute, but you need to understand this first.
So let's go and say "app_storage".
Set that that to run.
Do our fmain magic, and up here, I'm actually gonna paste some code that's not worth watching me type out.
But, we're gonna do a little trick so we can import size utility.
I'm gonna define a class called "thing".
Now, thing contains thing1 and thing2, and it has a way to print out what things it has.
So a thing object with the things in here, and also like this.
So we're just going to use that.
There's not a whole lot to it, just think of this as a standard class that has two fields, and it just knows how to talk about itself, Okay?
That's all there is that's happening here.
So let's go and create some things.
We'll have a thing, which is a hat, and a mat, you know, thing1 and thing2, we could do this Dr.Seuss style, I suppose.
Hat and mat, dog and cat, bat and bird, car and bike.
And let's just print out the things, we'll make a new line.
There we have thing object at some address with hat and mat.
Notice 20, and 4, different location, with dog and cat and so on.
These are our things.
Pretty cool, right?
Nothing special there.
Standard class stuff.
If you look at any of these things, any class at all, it will have a "__dict__" and it's, this is where Python fields, when you say "self.something" here, what that really does is it creates an entry.
This is equivalent to saying "self.dictionary of thing1 equals t1", right?
Those, the lines above these two are the same.
Put that down here so I can keep it.
Those two things mean the same thing.
Alright, so whenever you have an instance of a class, you have the instance of a dictionary on top of all the other stuff that might have to be tracked and whatnot.
So the cost in terms of memory of a class is a dictionary.
Now you might think like many languages, the cost is that string and that string, but there's actually two sides to that coin.
The cost is this string right here, thing, and that.
The entry that goes there, these are actually reused.
So there's only one, I guess there's not too much, but there is a pointer, got a 64 bit system, so that's 8 bytes plus this thing for every field.
So that's pretty interesting.
And down here, this is gonna be one of those.
So let's actually look at all of them.
So we'll say "obj.__dict__, for obj in things".
And then let's just print out the dictionaries.
And look at that.
Here we have a thing with hat and mat and here we have a dictionary, thing1 is hat, thing2 is mat.
thing1 is dog, thing2 is cat, right?
There it is.
Those are the fields self.thing1 self.thing2.
So you might wonder, well how much size does this use?
So we could print out the size, we'll just get the first one of the dictionary.
How much size does that use?
And let's also do class, things zero.
Class itself.
Perfect.
So we run those.
You can see the size of the dictionary.
Ah, we're using the wrong one.
Let's go and use our size util, get full size of the object.
That's right.
It's not traversing, is it?
There we go.
That makes more sense.
So the dictionary is 318 and the class is 366.
So it's another 48 Bytes added on here to have a class, but really the understanding that the dictionary is the thing that really holds most of the stuff to do with the class.
That's pretty interesting, right?
So this is where it's stored.
Now, question is, are these different dictionaries?
Are they copies?
What's the story?
So what we can actually do is we can go and look at the locations of the dictionaries.
We could say "this is going to be the id of d, for d in dicts".
And we could just print those, print the locations.
And If we look at that, what this is telling us is these are the memory locations of the dictionaries of the classes.
And so the same class, over and over and over again, right?
Those are different.
So what that's telling you is there's every time you say a new class, you get a new dictionary and that allocates some bits that are required to be tracked and managed in memory for every single one of these.
In particular, as I said at the beginning, that you've got to keep track of the entries, right?
You can see thing1, thing1, thing1, now the value of that string is actually reused, but, the fact that it's appearing over and over, you still have to keep track of that in the dictionary, right?
And that takes a lot of space, relatively.
Over here, we've got new dictionaries.
You might wonder, why is that required?
So let's go over here and say things, go to the last thing, I don't know which one it is, but we're going to say that, it was thing1 and thing2, those are its two fields.
Let's say we want to give it a thing3.
Python is a dynamic language, and will let you just dynamically apply these things.
Now, PyCharm says "you're asking for a hurt here", but you can tell PyCharm "don't bother me, I want to do this" and then we can go and we can print out all the things again.
Let's do, I'll just print the dictionaries.
So check this out.
We have different ones, and notice thing1 thing2, thing1 thing2, and the next one, thing1 thing2, but the third one, thing1 thing2 thing3, right?
Now there's a key 3, thing3 in the third one because we did this line 59 here, but the others they were unmodified.
Because of this dynamic nature, you can just go to an object and go "bam, You have more stuff than you knew about".
That means you have to have a dictionary that's dedicated to each class that tracks all the things that not just came with it, but also were dynamically added.
That's really flexible, but it also adds a lot of overhead.
Keep that in mind.
We're going to see if we could trade off this flexibility for much better performance.
Hint, we can.
The take away from this is whenever you have a standard class, the fields actually get stored like this into "__dict__" field name equals field value, and when you create a class, each one of them gets a separate, dedicated dictionary that is its own thing that was created in memory and managed and allocated and populated and taken care of like that.
So we, come over here and we look for the locations.
They're all different.
And the reason that you have a unique one for each class is to support this dynamism here, right?
Thing3 is not really an aspect of things, that's why PyCharm, if I don't tell it to stop, complains unresolved attribute reference thing3 for class thing, because you're kind of not treating this right.
But it will accept it, and some programs were written this way, so Python needs to be flexible for classes like this.
|
|
show
|
1:28 |
We're gonna talk about this idea, this concept called "slots", which lets us much more efficiently store data associated with a class at the cost of some dynamic flexibility.
So, standard Python classes we saw look like this.
Here we've got a thing, we call it a dynamic thing, because we can do the dynamic stuff to it, like later on add features to it.
If we create three of them: d1, d2, d3, and they have numbers: 1,2,3,4, 4,5,6,7, 7,8,9,10 you can see they each have a newly allocated and managed dictionary, and each one of those is gonna point over here to, you know, it's gonna have its own details, Right?
So the first one is gonna have "field a", which currently has a value of 1, the "field b" has the value 2 and so on.
But the last one, the "field a" is value 7 and so on.
And we saw that we can even dynamically add to these objects.
I don't think that's a super good idea, but you can do it.
And these dictionaries are there to support that amongst other types of things.
This is how things work unless you take specific actions.
This doesn't look super efficient, does it, right?
Especially the repeating of the keys a, b, c and d over and over.
If I've got a list of 100,000 dynamic things, I've got a, b, c, and d as keys stored in this dictionary, repeated 100,000 times.
Maybe we can do something with that to make our code more efficient in terms of memory usage.
|
|
show
|
5:27 |
We saw that switching our fields that we just pre-computed just in case over to property's gave us about a 30% increase, both in memory performance and CPU performance, right?
Time was shorter, memories was less, by that about.
Let's see what we can do around this idea of these dictionaries.
Because remember, when we say "create a crowd", we're creating 100,000 dictionaries and doing a bunch stuff in there.
Can we do better?
So let's go over here and we're gonna have another type that's gonna go over here called "PersonEfficientSlotted".
Why use the word "slotted"?
What is this about?
Well, let's go make it, then it'll Be clear.
So if we come down and we're gonna use our nice, advanced property one.
So these techniques combined, in a sense.
I'm gonna go, define that, so everything up here is going to be the same, right?
This, all this code is the same.
But remember, what this means is we're gonna have a dictionary that looks like first is whatever goes here, right?
Last is whatever goes here.
There's actually a not super well known way in Python to say "don't do that, just always put this value in the first container, this value the second container, that in the third, of where the data for the class goes and don't create this dictionary at all".
And the way you do that is you go up here on the type and you say there's thing called "slots", see that "slots" right there?
And then you give it an array, a list, of the names.
Notice PyCharm is like "this is all going to break".
But if I type "first", now it's like "okay, first is cool".
You can set a value first, but last, monthly, all this stuff, it's, this is the dynamic behavior, it's like "you can't do that", you have to pre-declare there's gonna be all the fields.
So you just say them here, monthly income, and birthdate.
Okay.
We've said "these are the field we're going to have, and then we're allowed to set them.
You saw that we're getting an error.
That's actually a runtime error.
Not just a warning.
It will fail.
if I have this, and like this, line 55 is a runtime crash.
So you've got a pre-declare all these things, but if we do, this is not going to create a dictionary and instead it's going to store the data somewhere else.
So let's go over here and we'll say, "Import this" and let's run this and see what we get.
Will it be better in memory?
Probably, because it doesn't have to create all those dictionaries with the repeated keys 100,000 times, and not doing that might make it faster.
Let's give it a shot.
Here's the first one, 460 milliseconds.
The second one is 180.
Look at that, 135, and in memory wise, 20 to 7.
That's better than 2X improvement.
That's a 3X improvement, almost.
All we had to do are, the way we program our code is unchanged.
We still have properties, we still have the direct fields that we can access and so on.
We just cannot dynamically add unexpected fields to this class.
So we have to say "these are the only things you can ever put there".
You saw that PyCharm was already warning us when we tried to break from doing that as a convention.
This is just making that a runtime requirement.
So, like before, in this storage here, if we went over to this class and we said, we probably won't make it that far, we have "slots" and its thing string, thing1 thing2, with the right spelling, and we run this, we run the right thing, notice it crashes right here, and we try to get dictionary, there is no dictionary.
It's gone.
That is not a problem anymore.
Although you can still see that trying to do its print out.
It knows about its fields and what it can do.
So the other thing you would find is like this line right here, if we try to do that right away, let's go put that here as a copy.
You run it, notice it crashes.
When we try to set thing3 that used to work, and it just added to the dictionary, but it doesn't anymore.
It crashes, okay?
So that is pretty interesting.
Let's take away the slots up here because this version is demo is not supposed to have that, but this one, definitely this one we're gonna add it.
So we've combined the efficiency of properties and the crazy efficiency in terms of memory of these slots.
So that's pretty interesting.
Remember, we didn't see a huge boost.
We did see a boost, but not a huge boost, of performance around time to create it.
So notice This is 400 and something milliseconds.
That's 170 and 150.
It's faster.
We're not creating those dictionaries, but it's not mega faster.
We'll see, though, that there's other aspects of performance as well that we're gaining also.
So there's all kinds of good stuff that we get here, and to me, You know, if I care about memory and I'm working with a lot of things, I feel like almost this is the way to go.
This is not common, and it takes away the dynamic nature of Python, but if you're not using it, you just saw, your paying a 3X performance cost in memory for features you're not using, and they're incredibly easy to just say "look, I don't need that feature on just this object, this class".
Slots, double thumbs up, these are awesome.
|
|
show
|
1:31 |
Let's review defining slots on classes.
It's incredibly easy.
The thing on the left is a standard Python class.
It has a constructor like thing in Python, an initializer, and it takes four fields and it's going to create them four parameters, it's gonna create Those as fields, the names don't have to match, but they happen to you this time.
We're gonna create a "self.a" with that value, self.b, c, and d.
If we want to have the better performance and less memory usage of not having that dictionary, we're going to use the keyword "slots".
Over here, slot thing rather than regular thing, same thing except we're gonna add "__slots__ equals the name of the fields: a, b, c and d".
You saw, if you try to work with a field even in the initializer, that's not listed here, it's gonna crash.
As long as you do everything so it lines up, this is a pretty good deal.
So over here we can create one of these objects.
We can print down a value like r.b will be 2.
And here we can even assign a new value r.e equals 7, and that enters a new key in that dictionary.
It works, but it's probably a bad idea, right?
It's not very discoverable and whatnot.
It's very weird.
But you can make it, you can do this if you want the way Python is defined.
But with slots, you can see that the normal stuff works but this creating a new field, like s.e equals 7, is an error.
In fact, what you'll get is "attribute error: 'SlotThing' object has no attribute 'e'".
I think this is a trade off for times when you're creating lots of classes and you care about memory, this is a really easy trade off to make.
|
|
show
|
3:35 |
The final example when we're talking about classes and memory and slots and all that kind of stuff is, I want to show you a different side of speed.
So over here in this one, we saw that we could create a crowd quicker.
The retirement summary didn't really matter, but we could create a crowd of 100,000 people quicker and use less memory.
But one thing you might notice about creating the crowd down here is we're not actually accessing any of the properties.
We're not like trying to read the first name or the birthdate or any of those things, we're just creating the objects and letting them go.
So what you'll see is that actually with slots.
there's efficiencies to be gained around just accessing the data.
So, let's look at some examples.
We're gonna go and say, Create one object, this is our semi-improved person, it has fields, and it has properties are we're only gonna work with the fields, not the properties.
So what we're gonna do is we're gonna create a standard one here like this and then we're gonna say we're testing it, and we're gonna run this function "test_access()", and what test_access does it just goes 100,000 times and it says "go give me the object here, give me the field, the first name, the last name, the birthdate, basically read the fields and assign them the local variables, throw it away, and do it 100,000 times".
And it gives this little report how long it took.
And then we're going to do with the same thing but with slots, remember?
This one will have those fields backed by dictionaries.
This one will have them backed more like by this list thing and index, here.
So we got better memory, is there a cost to be paid, or a benefit to be gained on either side of having these slots?
Because it may be lower memory, but it takes longer to access them.
Or do we have benefits around access speed or whatnot?
What we're gonna do is we're gonna go over here and just print this out, and say "how much faster is it?" Maybe I'll just do one other test just to make sure that we'll say "if standard time is less than slot time, print".
Okay, so we're gonna just do that and, you know, call this function right here, which is going to take this one object and access its fields, all of them, 100,000 times.
And just time it.
Let's go and see what we get.
Alright, look at this.
Slots were faster.
Okay, so, great, slots were faster.
And for doing 400,000 operations, that is four attribute reads, 100,000 times, took 25 milliseconds and that's 15.6 million operations per second.
But with slots, 22 milliseconds and 18.1 or 18.2 million operations per second, that's a gain of 16%.
So not only do we use less memory, not only are they faster to allocate, actually reading their properties, their fields, specifically, is faster.
Now there's some variability here.
Let's just run this over.
So there's 3, 8, 14, 9, 13, 13 right?
I am doing screen recording right now, which takes up a lot of its CPU cycles and whatnot.
So this is not super consistent, so you just run it a couple times, see what you get, but it's pretty clear there's at least a 10% advantage to be gained, sometimes 20% advantage for these different operations.
And to me, that is just awesome.
Like I said, earlier, what did I say?
Slots, double thumbs, or triple thumbs up because less memory, faster to allocate, and faster to work with?
That seems like a huge trade off to give up that little bit of dynamic flexibility, which usually we don't use in our code, anyway.
Slots are awesome, and here's one more reason why.
|
|
|
17:11 |
|
show
|
1:22 |
In carpentry, there's a saying "measure twice, cut once", and something like that should probably apply to software.
When we think we need to make changes for improving our code, we should probably measure it before we go make those changes.
My experience has been that we are very, very bad at guessing exactly what is expensive and what is not in software.
Sometimes this has to do with memory, sometimes it has to do a CPU speed.
Often there's a relationship between those things.
So in this section, we're going to talk about measuring memory usage and using tools to exactly understand what's happening, and then we can make changes and see how they're improving.
And we did that in a coarse-grained way before with our report memory thing that we did, it said "well, looks like the process gained this much memory from this step to that step", but all we were actually doing is asking how much memory is the process using now, or how much resident memory are we using right now, and that can change based on things that are not exactly to do with allocation.
We could have shared memory, other stuff could be happening in the operating system, the GC could be holding on to memory for a while, Who knows?
So what we're gonna do is we're going to use some tools to go and investigate and get more precise numbers about exactly what and where and how we're using memory.
|
|
show
|
2:56 |
Well, let's start our exploration.
Jump over here into PyCharm, and we're gonna use the tools we have at hand.
There's some profiling built right into PyCharm and you'll see that we're gonna need to switch to some other stuff that's better eventually, but let's see what we got right here.
Now, first of all, I want to take this function pattern clinging to memory thing that we did before.
I'm gonna just use that, we're gonna explore that.
We're gonna take this and just copy it down into chapter nine and we'll call it "app_to_profile" or something like that.
Now, over here, we're doing a couple of things that we're not gonna need anymore.
We don't want to have that size utility, right?
We don't want to do this stuff because we're not writing out the code, not right out the memory usage, based on process load or whatever we're calling it there, amount of memory used by the process.
Instead, we're actually gonna look directly at it, using the tools.
So let's go down here, clean this all up.
And also, we only want our "greedy_main", not our other main, so we'll just delete that one.
Alright, I could have done that before, but I kind of wanted to show you, we're just going to take that one, and just remove the reporting that we had built into it.
And now what I want to do is I want to first set this to run, so I can use my hotkeys on it, and yep, it still works.
But no more reporting, right?
Well, let's go and profile it.
Notice there's a bunch of buttons up here.
This one's run and this one's debug.
Debug is fantastic.
But this one is the one we're looking for.
We could profile it, so let's get some profiler information here.
Alright, sweet.
Look, you can see where we're spending a lot of time, a lot of time on randrange, randint, some list comprehensions and so on.
But that's not the best view.
Let's go zoom in over here and look at this.
We got lucky and zoomed right into the part that we wanted.
Over here, you can see that our script is running, it's calling main, it's calling scale_data, load_data, and filter_data.
And load_data, well is taking the longest, right?
It's taking two seconds.
What about the memory usage?
Hmm, nothing about the memory usage, unfortunately.
So this is only computational time, Nothing to do with memory, right?
And a lot of the profiling tools, are like that.
They'll profile CPU time, but they won't profile the actual memory allocation that happens to be going on.
So this is cool, and it's really neat to see that filter_data and scale_ data were much faster than load_data.
But those other two, they do happen to use a lot of memory.
We're gonna put this aside and say this is a great understanding of how our code is running in terms of time, but we want to understand it in terms of memory.
So we're gonna pull in some different tools to make that happen.
|
|
show
|
1:33 |
We saw that the CPU profiler, basically see Cprofile, built in to PyCharm, is not gonna do it for memory analysis.
So how about we get a memory profiler like "memory_ profiler" for Python?
Yeah, that's what we're gonna use.
So memory profiler is a way that we can capture much more fine grain detail about what our program is doing in terms of memory consumption, and those pictures I showed you before where the graphs of memory usage over time, those came from this tool and we'll see how to make them here.
So to get it as super easy, we're just going to pip install memory_profiler.
Technically, we're gonna add memory_profiler over to our requirements file, and then we have several ways to use it.
You can decorate a single function and then run it like this.
Run it through the module "memory_ profiler", and you'll get all sorts of cool output like this about how much each line of code is using.
That's pretty cool.
We're gonna do that.
You can also just say "go and run it".
Say there's a "mprof" command we'll have once we install Memory Profiler.
And when we do that, then we can run, it says executable, this is just your Python script, and then it will profile it, and then we can plot that.
So we'll get graphs like I showed you before, like down here, Okay?
So we're gonna go through both of these styles of working with Memory Profiler.
It's Open source, It was recently updated a couple months ago.
It seems to work pretty well.
|
|
show
|
2:58 |
Alright, let's try to use our memory profiler, or the memory profiler, on our code.
Now, I'm gonna go put that in the requirements here, but I copied It, so I'll just paste it "memory underscore profiler" is what we want, and will PyCharm know that this has to be installed?
It does, beautiful.
Just let it go ahead and install it.
Of course, you could do "pip install -r requirements" with your virtual environment activated.
down over here, we're going to just say "@profile" and import that from Memory Profiler.
Okay, that's a good start, although we didn't get the new line that we were hoping for there.
So this is going to give us detailed information about how memory is being allocated when we run this.
So, that's step one.
Step two is to come down here, either run in the PyCharm terminal, or we can come over here and if you want to see it big screen, we could run it like this.
We know that that is the full thing.
Great long thing.
Now all we gotta do say "-m memory_profiler".
Run that and we wait for a second.
Alright, let's see what we got here.
So it says we started out using 12.4 megabytes to run this function, and then this line, at this line we're using 51, So the delta, this line here of loading data has added 38.6 megabytes to the overall memory required.
And this one we got up to 64 total, and this added 13 MB's of RAM used.
We're up to 95 here, and this added another 30 and so on.
Print out some stuff added zero, and we just kind of stayed there.
Print f is done.
Pretty cool, right?
So this gives us really detailed information.
Not just "well, now the current memory usage is such and such", and it kind of shows you that here, but it shows you the delta and lines it up with the source code, even for every single line, at least in the function we put the decorator on, right, you could put this in multiple places.
This is cool, right?
So here's a really easy way to run your code and get this output.
Now it takes longer, like five times longer, to run with all of this analysis and all the instrumentation going on here to make this happen.
But you do it once or twice, you see what you get, and then you take away the profile stuff and you just let it go normal, right?
So this is really, really cool, and I think it's super easy to use.
You just alter the command by saying, You know, "-m memory_profiler" and you put the decorator on it and it's off to the races.
It gives you this great output.
I'm a fan.
I think this is gonna be super useful for people digging into how much memory their program is using.
|
|
show
|
0:34 |
Let's just review how we did line by line profiling with Memory Profiler.
We started out by finding the function that we want to look at that's creating the memory, or causing the memory problem, and we put an "@profiler" decorator out of that library on it.
And then we go and run Python, the one out of the Virtue environment, -m memory_profiler and then just the script and any arguments that it might take.
It runs and cranks on that for a while and boom, out comes some nice results showing how each line added to, or did not add to, how much memory was being used.
|
|
show
|
3:05 |
Well, having that printout and showing line by line how the memory is changing, is interesting, but it's also useful to know how the memory changes over time.
And you know, a picture is worth 1000 words, as they say.
So let's go make some pictures.
What we can do is we can come down here and we can use this "mprof" command.
So mprof does a bunch of things.
One, You can see is plot, and you can say list, but the one we care about is Run, given command or Python files.
So we'll say "mprof run", actually, we want to go into the right directory first.
There we go, that'll make it easier.
"mprof run" this one file.
Let's see what we get.
Takes a moment.
Sampling 10 times a second.
And it's done because of our little decorator, it prints out this stuff, but if you didn't have the decorator, you could still totally do this, you don't need this part at all.
You just get slightly different outputs.
Now, PyCharm is saying "Oh, we found a new file, maybe we should add here", and we could see what that is, as if we say "mprof list", like this.
We'll say "here's the ones that we ran", but most significantly, most relevant here, is we can say "plot".
Now, very likely this will fail.
Let's see what happens.
No module named PyLab.
Hmmm.
Well, what do we do here?
Let's try this.
Nope, that didn't work either.
Well, maybe a better error message would be relevant here.
But what we need to do is we need to have "matplotlib" if what we're gonna do is try to graph it, Okay?
So then we can just say "pip install -r requirements" like that.
Alright, with that installed, let's give it another shot.
Beautiful, beautiful!
Alright, here's an interactive matplotlib that will let us zoom in and check out all sorts of stuff.
Go back to the regular graph, but here we can see it starting up.
It's taking more and more memory.
Here must be the completion of load_data, here's filter_data, and here's scale_data.
You can see at the top that were up here around 94.8 megabytes of memory usage.
You can do things like set the title and stuff up here, but not super relevant to us.
You can use this little control bar, right?
Standard matplotlib type of things, but nonetheless, here's a cool graph, and probably the most relevant is just, you know, save this as a PNG that you could go work on.
So we can get these cool graphs and all we had to do to get them was go over here and say, "mprof run, mprof plot".
by default, it will plot the latest one, but if you have a bunch of these mprofile files laying around, you can actually tell it which one you'd like to graph, put them side by side, all those kinds of things.
|
|
show
|
0:29 |
Graphing memory usage over time for Python applications with Memory Profiler couldn't be easier as long as you have matplotlib installed and you have memory_profiler installed.
You can go over and say "mprofile run" and give Python script name or application name, and it will generate one of those .memoryprofile.dat files, and then you can plot it.
You just say "mprof plot" it'll plot the latest one, and you get something that looks like this.
|
|
show
|
1:27 |
I think memory profiler does a pretty good job, but sometimes you're using certain libraries or other things that maybe it's not going to do a great job of capturing information about, and so we can go and use another one that's really built for data science use cases.
What do I mean by that?
Its job is to find precise places where memory hits its peak.
And if you're doing data science, you want to run a script and you're gonna say, Well, we're gonna need however much memory, the top of that graph basically is going to require, and notice when we were doing the graph it was doing sampling rather than actually hooking into the allocation mechanisms being used.
We're gonna use this thing called "Fil", and Fil was meant to basically do memory profiling for these scripts that run a few things, have a few peak memory usage moments and then go away.
Also, it tracks allocation from many of the places where data science libraries might allocate data.
So normal Python code, that's cool, but also C code with malloc, as well a C++ with new, and Fortran 90.
It'll also keep track of that.
So a lot of these base libraries used in data science are based on these other languages, and you're just controlling or orchestrating them with Python.
So here's a cool way to get some analysis on that.
We're going to do basically the same analysis we did, but with this tool instead and see what we get.
|
|
show
|
2:47 |
Alright, well let's use Fil.
Again, we're gonna have to have it installed and put it up, well I'll leave it at the bottom.
So the way we do it is we say, "filprofiler" here.
Hit install.
Wait for a moment.
Alright, that's all good.
And let's go ahead and tell PyCharm it is not misspelled because you know what, it's not.
And let's cd, I'll show you a better way to get into that folder.
We can right click, say "open in terminal", there we go.
See, right there is our stuff.
So what we can do is we can say, "fil-profile", like this, the app that we want to run, That's it.
Let's run it and see what we get.
Oh, it looks like I forgot run.
That's right, "fil-profile run", okay, now it's doing its magic and it says it's gonna open the browser when it's done.
Also, I commented out the "@profile" decorator that I'd put on from Memory Profiler.
We don't want those two things to fight or one to measure the other.
It's done, generating report.
Now look, it opens my Web browser, and what do we get?
Cool, it says "Fil Memory profiler".
The command we ran was this, so it would record, like, command line arguments and stuff like that.
And here's the main star of the show, let me make it full screen.
So what we get is what's called a "flame graph".
The width tells us how much memory is allocated as well as of the color, and you can dive into it.
So over here, we're doing our main, and there's other stuff happening.
Let's dive into the main, and notice as I'm moving the mouse around, you can see how many bytes were used, how much of the total memory, the peak memory, was used.
Remember, this is all about measuring peak memory.
Go down here.
This one 50%, on this line.
See, we've got our original load_data and you can see what, you know, what this function is doing, right?
It's loading data then it's creating the randoms which is going to do this list comprehension which goes down into the guts of Python.
We could go back and we could dive into, like, over here, this scaled part where we're actually doing the data scaling and the filtering and you can see that took 26 megabytes, roughly or 36% of the data.
Okay, so this is a super cool way to go and explore.
You can click back here.
You can, you know, if you're down here, just hit reset, zoom, and it pulls it all back out.
So this, like there's more stuff we could go down and see, but, Not a lot happening, just extra space.
So this is a pretty interesting way to measure the memory, or explore the memory usage of our application.
And, like I said, its goal is to be focused on data science, script work flows, which a lot of times, you know, those use a ton of data because you want to load up a whole bunch of things and process it.
|
|
|
12:56 |
|
show
|
0:54 |
That's it.
You've made it to the end of the course.
You now know the black art of Python memory management.
And I say black art because it's a little bit obscure and a little bit hidden and I feel like in the Python ecosystem, we talk less about the mechanics and the algorithms and heuristics of how memory is managed, than other languages and other ecosystems.
I feel like the Java and .NET people obsess about the GC and tuning it and all that kind of stuff.
And in Python, there's scarcely any information to be found about how this stuff works.
So hopefully, throughout this course, you've learned a lot, and you have a better appreciation for how your programs run what you can do to be more or less efficient, and just a general appreciation of how memory works in Python.
So in this final chapter, we're going to review what we've learned and just go quickly through some of the highlights of the course.
|
|
show
|
1:06 |
We started our conversation talking about the simulation that you're living in.
This beautiful, clean world that is making everything easy for you.
And we said, Look, if you really want to know how Python works and how Python Memory Management works, you've got to look beyond this beautiful and handy facade that Python has put up for us.
We gotta take the red pill in Matrix nomenclature.
So we took the red pill and we saw that even simple things like numbers, like 42, in Python are really PyObject pointers, specifically PyLongObject pointers, and there's a whole bunch of stuff happening behind the scenes, even though there's this clean, simple programming language on top of it, that understand the mechanics of how it all works, like for the fact that numbers are pointers and allocated on the heap and managed by reference counting, we've got to look inside the CPython source code, which we did, and I gave you a bunch of short little links like, for example "bit.ly/cPythonlong", where you can jump over and see the entire file of source code, here.
|
|
show
|
1:31 |
Next up, we said We need to talk about allocation.
Now, Most people think of Python memory Management as cleaning it up, but there's also a whole bunch of techniques and heuristics and data structures put together to help us allocate memory without fragmentation and so on.
We started by saying, Look, we'll just go ask the operating system through the C API for memory.
Turns out, we didn't do it that way, right?
We said, Look, there's hardware, yeah, this is what we expected.
The RAM, the operating system manages the memory associated with every process, down at the C-layer, we can allocate memory from the operating systems virtual memory it gave us using malloc.
That's all good.
But then there are other layers that Python adds on.
The Python's Allocator, so PyMem and PyMalloc and all those API's, and then for smaller things, that is, objects that by themselves are smaller than 512 bytes, We also have Pythons Small Object Allocator.
Remember, Most things that feel like big objects are just a bunch of small objects linked together, so there's a good chance that the things you're working with are gonna land in this Small Object Allocated world.
You can actually see the source code that has kind of a ASCII art graphic like this at bit.ly/pyobjectallocators.
Here's a little tiny segment out of as well.
If you look at the bottom, it says "a fast, special-purpose memory allocater for small blocks, to be used on top of a general purpose malloc".
So this is the Small Object Allocator and friends.
|
|
show
|
1:17 |
The flip side of allocating memory is we're gonna have to clean it up.
So you saw the primary way for the vast majority of objects in Python, that this is done, is through reference counting.
Everything in Python is a PyObject pointer and allocated on the heap is a PyObject.
And we pulled up just a tiny bit here of the code from "object.h".
The most important part around this is that there's this thing called "ob ref count" that is like either 32 bit or 64 bit or whatever it happens to be here.
We've got this special type here that every single thing derives from and it basically just has this place for how many variables refer to it.
And that works almost all the time.
We've got a couple of variables.
One of them goes away and the count goes down.
If that count ever reaches zero, immediately that thing is deleted.
And by deleted, We don't mean necessarily the memory is given back to the operating system.
Remember, we have the blocks, we have the pools, and we have the arenas.
What it means is the space in the block is now free to accept another object of that size of its small.
If its large, it probably is immediately deleted and given back to the operating system.
Again, if you want to see this, bit.ly/cPythonobject.
|
|
show
|
1:19 |
Previously, I said Python uses reference counting for almost all of the things, but it does not use it for all of the things because sometimes reference counting fundamentally is broken and just cannot be used.
And we talked about where that was.
That's cycles.
Even if it's not just one thing refers to another, which refers back, but a very long, complicated chain of links.
If any of those form a cycle, they'll never have their reference count go to zero, so that will never get collected.
That memory would be leaked.
So there's a generational garbage collector that lives in Python that goes around and looks for things that are in these cycles.
It only tracks container objects, classes, dictionaries and so on, and it has three generations.
generation zero, one, and two.
Zero, by default, runs when the number of new objects minus the deletions recently, whenever it checked last, exceeds 700.
It's going to do a Gen zero collection, and then one of the 10 times it's gonna do a gen one and zero, and then finally, one out of 100 times, it'll do Gen two, one, and zero, and check all the memory that's hanging around.
So Python has this combination of both reference counting with a backup garbage collector to catch the places where reference counting breaks, that's cycles.
|
|
show
|
0:45 |
When you think of the data structure that's gonna hold all of your data, you have to remember they're not all created equal.
We saw that we could have a list.
We could have a typed array, in the case of numbers, we can have Pandas data frames, we can have series, we can have NumPy arrays, we can have a list of classes, all of these things have benefits and drawbacks for the way we use them in our program, but they also have very different memory costs.
For example, our simple little typed array was something like 10 times more efficient than storing those numbers in a list, which is still more efficient than using things like Pandas or classes.
You may want to choose one of these more advanced data types because it does amazing stuff for you, but just be aware that there is a memory price to be paid for it.
|
|
show
|
1:45 |
When we looked at functions, we saw two basic patterns that would really help us aim the memory usage.
We have this one here.
It looks really nice.
It's gonna load some data and create this data pipeline.
It's going to then scale it, pass it along, getting the next step back, and then it's gonna filter it with that middle step and then get the final step, the filtered data.
While this was good, and as sort of a code style, it's really easy to read, we saw that if we just did something silly, like reuse the variable name, like "here's the data at this step", we were able to get dramatically better memory performance.
So here you can see we're up over 100 megabytes on this one on the left.
The one on the right is 71.
It's 34 megabytes less just by using a different variable name.
That's kind of insane.
You can do this and have a lot better memory usage.
The reason is original, scaled, those two stick around for the entire length of the function, even though after the second step, original is no longer needed, so it would allow us to like free that memory up and then reuse it without getting more from the operating system.
So this is really cool.
And then what blew this out of the water, even, this technique, was to use generators.
Now generators limit what you can do with the data you get back.
But in this pipeline scenario, generators are so perfect.
So we saw that, even though going from 105 to 71 megabytes is pretty amazing, with generators, we went all the way down to 9 megabytes.
It was off the charts good.
So a little bit, tiny bit slower, CPU wise, computational wise, but in terms of memory, it was the clear winner.
We have this pattern of not letting the intermediate variables hang around too long, which is pretty awesome in a general thing, and then if it works, generators are really good for memory.
|
|
show
|
2:40 |
We saw a lot of times when we create classes were gonna compute some information based on the data provided to it.
Like in this example here we have an A and a B, but we also want people to be able to access C and D, which is the ratio and the multiplier of those two things.
So to make that super easy for them, we're just going to say "self.a equals A, and B equals B", and then we're going to compute those here so later they could create a thing and say "thing.c or thing.d" and get that information right away.
Well, we've got to store all of those four things in memory, but if we found some other way to provide C and D in the same syntax for the consumer, we would save potentially half the memory.
And guess what?
That's properties.
I'm sure you know about properties, but it's easy to forget their relevance in the context of memory, right?
So here we're only storing A and B and not those other two variables and the cost for having the property is a one time thing, right?
Defines the code on the type object.
It's not an instance for every time we make one of these, so it's a huge efficiency gain.
Also, if you were going to create one of these and we might say "use property C but not property D" it also could be faster because we're not doing that computation for D every time we create one, even in the cases where we're not using it.
Another thing that we compare together with properties, actually, is this idea of slots.
Python objects in general are dynamic objects that can grow, and things can be added to them, and new fields can be created or fields can even be taken away, all sorts of funky stuff.
Most of the time though, people just treat them as regular old objects doing traditional object-oriented programming where that is not the thing that happens.
If that's the case, we'll see that we can use something called "slots", and with slots, we just name the variables, name the fields, and it turns out that this is much, much more efficient in terms of memory.
Remember, we don't create that one-off dictionary every time we create the class, we just have a space for memory where the variables go, so that's faster.
Down here, we create a regular thing, we can say "r.b" which gives us 2 in this case.
We can also add on "r.e", right?
This that dynamic stuff.
Not sure it's a great idea, but technically it works.
If you adopt slots, you get the same regular behavior, but you can't add on, you can do anything dynamic.
So like, "s.e" is actually an error, "AttributeError: 'SlotThing' object has no attribute 'e'".
So you give up this dynamic nature, But usually, like I said, you're not using it, so this is the really good trade off in terms of making your code much, much faster.
|
|
show
|
1:13 |
We saw that we can investigate the memory used by our program in different ways: as a picture over time, line by line, by peak memory usage, all those things.
And We looked at two profilers: memory_Profiler and Fil.
Fil's the one that measures the peak stuff best.
And this one here, memory_profiler, let's us ask questions about line by line.
So we add this decorator out of the library, and then when we run it, we just say "Python -m memory_profiler", give it the script, or the application to run, and the arguments, it's a little bit slower.
Well, it's a lot slower, but, you know, let it run and then out comes this really nice report of exactly how much memory was used by each line, what was gained, what was lost and so on.
So if you're worried about memory, take the time to stop and look.
Ask these tools "what's going on?" because there might be some part where you think "I need to optimize here", but it's actually some other line, like something really simple, like, "Oh, we're doing slicing" and it seems really minor, but that actually makes a copy a bunch of times and turns out to be real expensive or something along those lines.
Have a look with the tools that we showed you and investigate what your app is doing and then apply some of the techniques that we've talked about to make it better.
|
|
show
|
0:26 |
Finally, I just want to wish you the best of luck on your projects.
I'm sure you're trying to solve some memory problems or make your programs more efficient and I hope what you've learned here has given you a better understanding of how everything works in Python, some of the techniques and patterns you can apply and the tools that you can use to get there.
Thank you, thank you, thank you for taking my course.
It's been great to spend the time with you talking about Python memory.
See you online.
Bye.
|