Python Web Apps that Fly with CDNs Transcripts
Chapter: Avoiding Stale Caches
Lecture: cache_id_builder

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Back in PyCharm and time to write some more code. A new chapter, so remember, we're moving from the old to the new.
0:08 That means we're unmarking that as a sources root and we're marking this one as it and we're running this particular file here.
0:19 So just make sure you're running the right one. You can see I've already set up a chapter five run configuration for this.
0:25 Now there's been a file hiding in our code the whole time because I put it in the starter project.
0:31 I want to make sure that if you copy that and ran with it, you would just have it to work with at this point. So over in the infrastructure section,
0:38 we have this thing called CacheBuster. And let's have a look at this file. And it has one important method, build CacheID.
0:48 So recall we saw question mark, cacheID equals some value, here it is. So what's going to happen in here is we're going to give a file name.
0:58 This is a static file like /static/image/car.jpg. And we're gonna, when we set things up, tell it what is the base folder for our installed web app.
1:13 So we said our app is in /applications or webapps/whatever, and then it can resolve basically the full file path to that static file
1:27 in order to work with it. It has two other options, whether it's in dev mode or not, as well as whether it will hash, it'll log these hash operations.
1:36 And basically, right now it prints them. Here you can see if you want to see output from it, you can say, yes, please show me that.
1:44 More likely you would plug this into your logging framework, like Logbook or LogGuru or something like that. But I don't know which one you're using,
1:55 so you're going to have to adapt that if you want that behavior. The dev mode means basically every single change you make will live reload the cache.
2:05 That's a little bit less efficient because every request has to cache all the requested files over and over again, which is not great.
2:13 So you definitely want that off in production, but it can make things easier in development if you're not doing performance testing.
2:19 Okay, so how does that work? We take this file name and we turn it into a full path name based on where the web app is installed,
2:27 and then what is the relative full path within your web app, forward slash something, something, something. If it doesn't exist for some reason,
2:38 we're not gonna crash, we're gonna return the cache ID of error missing file. That way when you do a view source,
2:44 instead of seeing this cache ID or nothing and wondering why it's not working, it'll put this question mark, error missing file text in there.
2:53 It also say, look, you gave me a directory, I can't hash a directory, I'm supposed to work with files and so on. Then it keeps a dictionary.
3:04 Now this is a very, very simple version in memory. You might think, "Oh, well we need register, database or something like that." No.
3:12 It keeps this up in memory and it says we're going to go here and compute the file hash. This is just the MD5.
3:22 Now you might have heard, "Don't use MD5, please, because it's not cryptographically safe We're not doing passwords, we're not doing cryptography.
3:30 What we're doing is we're just trying to get something that will change every time the file contents change and we want it to be fast.
3:37 This is perfect here. So we're going to use MD5 and we're just going to pull that back in here and it just basically gets a short version.
3:46 Actually, it's a whole thing but it is pretty short. So it'll say open up the contents of the file.
3:52 binary thing, turn it into this hex information and store it in this dictionary and then return
3:57 that value. Up at the top, I remember I skipped a section right here on line 11 or line 12. It
4:05 says if it's not in development mode, and we've already seen this file before, that is we've
4:11 already computed the hash, just return this. So this part makes it super, super fast.
4:18 Now the reason we don't need Redis and a bunch of scaling and infrastructure is when we deploy
4:24 our web app, we'll fan it out into multiple worker processes and sure the first request
4:29 for a particular file to a certain server or certain worker process will have to recompute
4:35 it but then once it's up and running, it's never going to hit it again. And this is a really fast operation anyway.
4:42 And so instead of adding a bunch of overhead, just keep a bunch of file names and hash results in memory and if they got to be re-computed, no problem.
4:51 And in production, the way to get this to trigger is you're going to have to redeploy the website or restart, just restart really the website.
4:59 But typically that's how you put static, new static files and things up there anyway, so they're kind of tied together, hopefully.
5:07 Anyway, that's how it works for us. I could have just said call this function, magic happens, but I really want you to understand
5:14 that there's very little magic going on. For performance reasons, we're keeping a hash or a dictionary of given a file name,
5:22 if I've seen it before, I know what the answer is. And otherwise, we're just gonna read the bytes of the file and return a binary hash of that, right?
5:32 That's how we get that cache ID. And because the way MD5 and other hashes work, if the contents change, that number changes significantly,
5:42 and that's gonna change our URL, which will erase this hashed staleness problem.


Talk Python's Mastodon Michael Kennedy's Mastodon