Write Pythonic Code Like a Seasoned Developer Transcripts
Chapter: Dictionaries
Lecture: Hacking Python's memory

Login or purchase this course to watch this video and the rest of the course contents.
0:01 All right, this next tip or technique I want to show you might almost be considered anti-Pythonic.
0:07 Let's look at the Zen of Python and see what it says. So, recall the Zen of Python says "special cases aren't special enough to break the rules",
0:14 so that leads to a really clean and simple and easy to understand language, often with one way to do things instead of three or four, that's awesome.
0:23 However, it also says "practicality beats purity", so let me give you a heavy dose of practicality involving dictionaries, and memory in Python.
0:33 So here is the server memory process for the web servers of this company called oyster.com, they do like hotel booking and that kind of stuff.
0:41 They wrote a nice blog post about it we'll look at it in a second. They were storing a lot of stuff in memory, cached, using Python objects,
0:49 they used this concept of slots which we are going to talk about to go and save 9GB of RAM on their server,
0:56 and it literally just took one line of code. This will give us a chance to look inside at the backing store
1:01 for custom objects, which normally is a really good thing to have, but every now and then can be trouble as you can see here.
1:09 So we are over here in Windows 10, and this is oyster.com, you can see it's all about booking hotels that have been
1:15 checked out by real people, sounds cool. Here is their blog post talking about how they used slots on this image class that they cache heavily.
1:22 Let's go and look at this in a different example that I have created for you in PyCharm. We are over here in Windows 10 because the tools
1:30 to look at the process details and understand its memory usage and CPU usage are really great on Windows.
1:36 So we are going to work with four types holding the same information, we are going to have a tuple which holds four values, they are unnamed,
1:43 we are going to have a thing called an ImmutableThingTuple, we are using this name tuple to create it, has values a, b, c and d,
1:50 we are going to have a regular class, a plain little class that has four values a, b, c and d,
1:56 storing those on its instance, and we are going to use slots with this what - we are calling an immutable thing, it's a little bit wrong,
2:04 it's immutable and that it can't have new attributes but the values a, b, c and d, those can change.
2:10 By adding the slots here, what we are going to do is, remember, every normal object has a dictionary backing store,
2:18 and so if I look at "self.__dict__", you would see that it had four entries, a, b, c and d
2:26 and the values would be whatever the values of a, b, c and d were passed in. Each instance of mutable thing has its own separate copy
2:34 of the dictionary, which means it has its own separate copy of the keys as well. On the other hand, this one when you define slots, it says look,
2:42 this type holds four and exactly four things with this name and we can put the storage of those slots into the type,
2:51 which is a singleton, instead of two the instance variables which they are maybe millions.
2:57 So, let's look at the code below where we created a bunch of these and we look at the memory pressure and behavior of the different items.
3:04 So notice, we have one million items we are going to work with, we want to put them into this list.
3:09 What we are going to do, we are going to time every one of these operations the same,
3:14 we are going to choose one of the four options, loop 1, loop 2, loop 3 or loop 4. Loop 1 is going to use straight regular tuples
3:21 and it's going to allocate inline a tuple with four values, n+1, n+2, n+3, n+4 and it's going to put it into our list.
3:33 So here we have in memory one million of these tuples, we are going to determine how long that's going to take,
3:40 here we see "Finished, waiting... done in" a certain amount of time and this is an input call so it's going to block.
3:44 The reason it blocks is I want to process to stay alive so we can go look at its memory graph before it exits.
3:51 OK, so you just hit Enter to exit; first, we are going to run this for tuples. Here you can see it took about half a second,
3:59 paste it over here so that we can see what the relative performance was, and let's run process explorer, which lets us look down here at the details.
4:08 So here is Python, you can see, if we open this up we've got a performance graph, right now it's using a 145.7 MB in private bytes.
4:19 So we'll note that here. Let's run named tuples. So it's an interesting question to ask, if these absolute bare minimum tuples that can't expand,
4:30 don't have names, things like that, how do they compare both in performance and memory of our named tuple that we created up here above,
4:40 like so, using our collections.name tuple. All right, let's run and see how it works. OK, well, that is slower, let's go ahead and copy that
4:51 and put it in our little document here, so it's about three times slower,
4:55 as you would expect, it's doing quite a bit more work to parse those and so on, let's look at the memory here, the memory is about the same,
5:03 143.3 MB, so no big deal. Let's move down the line here and run it for a standard class,
5:11 so this one is probably going to have the least good performance from a memory perspective, because remember,
5:16 every instance could have been modified dynamically at runtime, to have new attributes and so on, so they all have their own dictionary.
5:23 Let's give it a shot and find out what happens. In terms of speed, it's almost identical to named tuples,
5:31 that's cool, what about memory- wow, memory is little more, like almost a 100 MB more, so 241 MB.
5:41 All right, here is where it's going to get interesting, if we run slots, we would expect it to take more time,
5:46 it's doing comparable stuff to what those two were taking. However, the memory story should be pretty interesting, let's see what it is,
5:52 how close is it going to be say to the class versus the named tuple versus the regular tuple. All right, let's let it go and see what happens.
6:05 Timewise, faster than name tuples, that's cool, now let's go look at the memory. Look at that, 139.3 MB, that's pretty interesting, 139.3 MB,
6:21 not only does this completely, completely do better than regular classes, it actually does better than named tuples
6:30 and it even does better than regular tuples so we get the best memory usage using the slots here, and we save 101 MB so that's a huge improvement,
6:42 now let's go look back our type here. Remember, regular tuples are very useful but they are very inflexible,
6:48 they would not be a good stand-in for a class, most of the time. Named tuples are better, at least they have names for their properties.
6:56 So like if I had one here, a, b, c and d and so on, that's cool, but again, they can't have methods and whatnot, on the far other end of the spectrum,
7:08 we have regular custom classes that are extremely rich, properties, methods, inheritance, overloads and so on, but this one pays a huge price,
7:18 so what is interesting is we can get basically all those features except the ability to dynamically add fields to instances at runtime
7:25 after we create them. if we are willing to give up that thing, we can get huge memory improvements while still keeping
7:32 all of the flexibility in power of classes. Now, I want to take a moment and just say do not do this by default,
7:37 this is not the Pythonic thing to do, this is not how Python is meant to work, it's suppose to work in this nice flexible dynamic way,
7:45 but understanding how the memory works in these types, and the ability to take control over that and change it,
7:53 when you need it, when you can say "I can put this one line here", and we need to save 9 GB of RAM on our server,
8:00 we can do way more processing or manage for your servers, that's a huge win, and it may be worth it.
8:06 So, this may or may not be Pythonic, I'll leave that up to you but I thought it was interesting to put it in here into this dictionary section
8:12 because it makes a big difference and on one hand you say "what you are doing is kind of an abomination against the class",
8:18 on the other- instead of being forced to use regular tuples, you can actually use rich types and even get better memory usage.
8:26 So, you decide, but it's good to know about. Here is how you do it, you just set a __slots__
8:31 and set it to the name of the fields that you are going to use, here we say a, b and c, and henceforth,
8:37 the only fields that you can have on this immutable thing are a, b and c if you end it __init__ tying to take a "d" and say "self.d =" something else
8:47 they would crash and say no, you cannot add a "d", it takes "a, b" and "c". So very strict about the variables it can have but once you do this,
8:55 it changes the way the memory works behind the scenes for the class. One more time, use this extremely sparingly
9:02 but it's a nice power to have if you need it.


Talk Python's Mastodon Michael Kennedy's Mastodon