Write Pythonic Code Like a Seasoned Developer Transcripts
Lecture: Hacking Python's memory
0:01 All right, this next tip or technique I want to show you
0:03 might almost be considered anti-Pythonic.
0:06 Let's look at the Zen of Python and see what it says.
0:09 So, recall the Zen of Python says "special cases
0:11 aren't special enough to break the rules",
0:13 so that leads to a really clean and simple and easy to understand language,
0:18 often with one way to do things instead of three or four, that's awesome.
0:22 However, it also says "practicality beats purity",
0:26 so let me give you a heavy dose of practicality involving dictionaries,
0:29 and memory in Python.
0:32 So here is the server memory process for the web servers
0:35 of this company called oyster.com,
0:37 they do like hotel booking and that kind of stuff.
0:40 They wrote a nice blog post about it we'll look at it in a second.
0:43 They were storing a lot of stuff in memory, cached, using Python objects,
0:48 they used this concept of slots which we are going to talk about
0:51 to go and save 9GB of RAM on their server,
0:55 and it literally just took one line of code.
0:57 This will give us a chance to look inside at the backing store
1:00 for custom objects, which normally is a really good thing to have,
1:03 but every now and then can be trouble as you can see here.
1:08 So we are over here in Windows 10, and this is oyster.com,
1:11 you can see it's all about booking hotels that have been
1:14 checked out by real people, sounds cool.
1:16 Here is their blog post talking about
1:17 how they used slots on this image class that they cache heavily.
1:21 Let's go and look at this in a different example
1:24 that I have created for you in PyCharm.
1:27 We are over here in Windows 10 because the tools
1:29 to look at the process details and understand its memory usage and CPU usage
1:33 are really great on Windows.
1:35 So we are going to work with four types holding the same information,
1:38 we are going to have a tuple which holds four values, they are unnamed,
1:42 we are going to have a thing called an ImmutableThingTuple,
1:45 we are using this name tuple to create it, has values a, b, c and d,
1:49 we are going to have a regular class,
1:51 a plain little class that has four values a, b, c and d,
1:55 storing those on its instance, and we are going to use slots
1:58 with this what - we are calling an immutable thing, it's a little bit wrong,
2:03 it's immutable and that it can't have new attributes
2:05 but the values a, b, c and d, those can change.
2:09 By adding the slots here, what we are going to do is, remember,
2:12 every normal object has a dictionary backing store,
2:17 and so if I look at "self.__dict__", you would see
2:22 that it had four entries, a, b, c and d
2:25 and the values would be whatever the values of a, b, c and d were passed in.
2:30 Each instance of mutable thing has its own separate copy
2:33 of the dictionary, which means it has its own separate copy of the keys as well.
2:38 On the other hand, this one when you define slots, it says look,
2:41 this type holds four and exactly four things with this name
2:44 and we can put the storage of those slots into the type,
2:50 which is a singleton, instead of two the instance variables
2:53 which they are maybe millions.
2:56 So, let's look at the code below where we created a bunch of these
2:58 and we look at the memory pressure and behavior of the different items.
3:03 So notice, we have one million items we are going to work with,
3:06 we want to put them into this list.
3:08 What we are going to do, we are going to time every one of these operations the same,
3:13 we are going to choose one of the four options, loop 1, loop 2, loop 3 or loop 4.
3:17 Loop 1 is going to use straight regular tuples
3:20 and it's going to allocate inline a tuple with four values,
3:28 n+1, n+2, n+3, n+4
3:30 and it's going to put it into our list.
3:32 So here we have in memory one million of these tuples,
3:37 we are going to determine how long that's going to take,
3:39 here we see "Finished, waiting... done in" a certain amount of time
3:41 and this is an input call so it's going to block.
3:43 The reason it blocks is I want to process to stay alive
3:47 so we can go look at its memory graph before it exits.
3:50 OK, so you just hit Enter to exit; first, we are going to run this for tuples.
3:55 Here you can see it took about half a second,
3:58 paste it over here so that we can see what the relative performance was,
4:03 and let's run process explorer, which lets us look down here at the details.
4:07 So here is Python, you can see,
4:09 if we open this up we've got a performance graph,
4:12 right now it's using a 145.7 MB in private bytes.
4:18 So we'll note that here. Let's run named tuples.
4:21 So it's an interesting question to ask,
4:24 if these absolute bare minimum tuples that can't expand,
4:29 don't have names, things like that,
4:31 how do they compare both in performance and memory
4:34 of our named tuple that we created up here above,
4:39 like so, using our collections.name tuple.
4:44 All right, let's run and see how it works.
4:47 OK, well, that is slower, let's go ahead and copy that
4:50 and put it in our little document here, so it's about three times slower,
4:54 as you would expect, it's doing quite a bit more work to parse those and so on,
4:59 let's look at the memory here, the memory is about the same,
5:02 143.3 MB, so no big deal.
5:06 Let's move down the line here and run it for a standard class,
5:10 so this one is probably going to have the least good performance
5:13 from a memory perspective, because remember,
5:15 every instance could have been modified dynamically at runtime,
5:18 to have new attributes and so on, so they all have their own dictionary.
5:22 Let's give it a shot and find out what happens.
5:27 In terms of speed, it's almost identical to named tuples,
5:30 that's cool, what about memory- wow, memory is little more,
5:33 like almost a 100 MB more, so 241 MB.
5:40 All right, here is where it's going to get interesting, if we run slots,
5:43 we would expect it to take more time,
5:45 it's doing comparable stuff to what those two were taking.
5:48 However, the memory story should be pretty interesting, let's see what it is,
5:51 how close is it going to be say to the class versus the named tuple
5:55 versus the regular tuple.
6:00 All right, let's let it go and see what happens.
6:04 Timewise, faster than name tuples, that's cool,
6:11 now let's go look at the memory.
6:14 Look at that, 139.3 MB, that's pretty interesting, 139.3 MB,
6:20 not only does this completely, completely do better than regular classes,
6:27 it actually does better than named tuples
6:29 and it even does better than regular tuples
6:32 so we get the best memory usage using the slots here,
6:36 and we save 101 MB so that's a huge improvement,
6:41 now let's go look back our type here.
6:44 Remember, regular tuples are very useful but they are very inflexible,
6:47 they would not be a good stand-in for a class, most of the time.
6:52 Named tuples are better, at least they have names for their properties.
6:55 So like if I had one here, a, b, c and d and so on, that's cool, but again,
7:02 they can't have methods and whatnot,
7:05 on the far other end of the spectrum,
7:07 we have regular custom classes that are extremely rich,
7:11 properties, methods, inheritance, overloads and so on,
7:15 but this one pays a huge price,
7:17 so what is interesting is we can get basically
7:19 all those features except the ability to dynamically add fields to instances at runtime
7:24 after we create them.
7:26 if we are willing to give up that thing,
7:28 we can get huge memory improvements while still keeping
7:31 all of the flexibility in power of classes.
7:33 Now, I want to take a moment and just say do not do this by default,
7:36 this is not the Pythonic thing to do, this is not how Python is meant to work,
7:41 it's suppose to work in this nice flexible dynamic way,
7:44 but understanding how the memory works in these types,
7:48 and the ability to take control over that and change it,
7:52 when you need it, when you can say "I can put this one line here",
7:55 and we need to save 9 GB of RAM on our server,
7:59 we can do way more processing or manage for your servers,
8:01 that's a huge win, and it may be worth it.
8:05 So, this may or may not be Pythonic, I'll leave that up to you
8:08 but I thought it was interesting to put it in here into this dictionary section
8:11 because it makes a big difference
8:13 and on one hand you say "what you are doing
8:15 is kind of an abomination against the class",
8:17 on the other- instead of being forced to use regular tuples,
8:20 you can actually use rich types and even get better memory usage.
8:25 So, you decide, but it's good to know about.
8:27 Here is how you do it, you just set a __slots__
8:30 and set it to the name of the fields that you are going to use,
8:32 here we say a, b and c, and henceforth,
8:36 the only fields that you can have on this immutable thing are a, b and c
8:41 if you end it __init__ tying to take a "d" and say "self.d =" something else
8:46 they would crash and say no, you cannot add a "d", it takes "a, b" and "c".
8:50 So very strict about the variables it can have but once you do this,
8:54 it changes the way the memory works behind the scenes for the class.
8:57 One more time, use this extremely sparingly
9:01 but it's a nice power to have if you need it.