MongoDB with Async Python Transcripts
Chapter: Performance Tuning
Lecture: Our First Python to MongoDB Index

Login or purchase this course to watch this video and the rest of the course contents.
0:00 All right, let's write some code. Over here, new chapter means new directory, so I'm going to call it chapter 10, like that.
0:12 And again, we're going to see this blue, we could right click, mark directory as unmarked
0:16 as sources root, or you could just hit whatever hotkey your operating system has specified there, which is what I like to do.
0:23 Similarly, there's one to create one here as well. Now just like before, we want to take the code from the previous example and use it going forward.
0:32 So I'm just going to nab all of that and paste it over here for now. This time, instead of having FastAPI, we don't want to have that.
0:46 Instead we're going to have what we're going to call the speedy CLI. I'm just going to drop some code in here that we've already brought in.
0:56 All right, let's talk through this real quick, because what's important is not how we're
1:00 running the code, but how we can make the answers that we're getting from the code that
1:06 our MongoDB service, our Beanie service that we created, how fast that is.
1:11 So kind of like before, we can show the summary stats, we can search the database, we can find the most recently updated packages.
1:19 We have just a header that says this time it's version 1.1. But now when we have these summaries and things along those lines, we're doing it a little
1:29 bit different. For example, this time we have a timed async. And instead of just running the query once or these three queries once, what we're doing
1:39 is we're running it some number of times. And it looks like that is 100. So we're going to answer ask that question 300 times.
1:48 The idea here is anytime you're doing performance type of testing, just asking once doesn't make a lot of sense, right?
1:56 There's in the motor package, there's a connection pool, maybe a DNS lookup in a network connection.
2:04 That's actually where you spent most of your time the first time, but then the next time it would just be still talking to the database.
2:10 So this lets us get a lot of that warm up stuff and out in the wash if we do it 100 times, and then we get the same output.
2:17 Similarly, we can get the recent packages, but instead, we're going to do it a whole bunch of times and then return them.
2:24 There's a bunch of functions in here about when I get a package, but let's get it 100 times, right? All of those things to make it go good and fast.
2:34 So let's go ahead and run the program now. Let's also make sure we're running the correct one. There we go.
2:47 So we can see our summary ran in 715 milliseconds. Again, every time you see these numbers divide by our times up here, which in this case is 100.
2:58 Right, so we got that. Let's go over here and see the most recently updated packages. That's something we could do. So what do we get back?
3:11 ran in one second. So in that case, let's go down here and double check. We're calling
3:21 our time recent. Again, that does it how many times 100 times or number of times. So when
3:26 you look here, divide that by 110 milliseconds, not terrible. One thing you do need to keep
3:32 in mind for this data is there's only a quarter million releases, there's only 5000 packages,
3:38 not a ton of data to work with. So we can make it better. If this was a real system
3:43 with millions of records that had no indexes, and then we add them, it'd be great. But nonetheless,
3:49 our goal is to make this a little bit faster. So let's go in here and see what's happening.
3:56 The first part when the first idea, the first thing you want to consider when adding an And index is what am I querying on and or what am I sorting by.
4:09 So in this case, find all, nothing we can do to make that faster. That's just get them all to us for now, at least with regard to indexes.
4:17 But this sort by package last updated descending. Well, if that's something we're going to do often, let's go and add an index for it.
4:26 So when I go over to package, and recall, we have our settings class and down here is where we can put our indexes.
4:34 Now you can do things like just this negative last updated would create an index, but it
4:40 doesn't give a name, as well as it doesn't let you do more specialized things like composite indexes as well. This is nice and clean.
4:49 But indexes are so important, I think you should be a little more explicit about them.
4:53 So what we're going to do is we're going to use PyMongo, which I don't believe we've really used at all yet. I want to have an index.
5:03 The first thing that goes in here is the keys. And this is going to be a list of tuples, which is kind of funky.
5:10 So what goes in here is last updated, and then PyMongo descending. Now this, I believe, would be enough.
5:24 But the other thing we want to set, as I said, is a name. So let's call it, I like to say, something involving the keys and the direction.
5:31 So I will call this last updated descending. Now if we go over to our Studio 3T, and we go to our package, make sure we refresh it just in case.
5:45 Notice there's just the default primary key index. And we run this by virtue of Beanie starting up and looking at this.
5:56 When we do the connection, we pass over the package, it looks at the settings, says, oh,
6:00 that index is not there by that name, we're going to try to create it. So we run it again. Now we haven't done anything yet, have we? Let's go see.
6:10 We refresh. And hey, hey, look at that. There's now this index here.
6:15 And if we go explore the index, it has the field, the direction, other options if we wanted to make it unique or have a time to live.
6:25 Those are for temporary tables that don't hold the data forever. All these different things.
6:30 If you want to do full text search or geo, geospatial types of things, all of this business
6:37 here, this is what we can specify using the full PyMongo index model rather than just
6:45 the name. So you can sort of grow it and expand it as as needed. You can also do things up here
6:51 and say that this is an index field directly, but I kind of like to put the indexes all in one place.
6:57 Let's go read this code again. What we asked before before is the recently updated packages.
7:03 They go off the screen there. Look at that quite a bit faster, run it one more time.
7:11 500, you know, the database is getting used to using that index, we could go over there,
7:17 and we could actually verify that this index is being used. If we go back to packages,
7:22 if you want to, you could write out that same query. So find not sort. And then the sort
7:29 is going to be last updated minus one. And then we had I believe we had on there a limit of five.
7:40 But we got back, if I disappear for a minute down the right, you can see that ran in two
7:46 milliseconds in the bottom right there. But we can go to this and say explain, explain yourself,
7:52 explain says MongoDB, what, what are you doing to run this query? Tell me more about it.
7:59 So it's going against pypi.packages. Cool. Look at this index scan. Awesome. What index is using
8:09 the one we just created last updated descending. Well, that's the winning plan because it was super
8:16 fast. We're eliminating by five. And we're doing this index scan to figure out the order. Excellent.
8:22 So you can see, we can understand how those changes we made over here in beanie.
8:29 Right? How this index model we created, which then push that over to MongoDB,
8:35 we can understand whether or not MongoDB is actually using it provided you're able to
8:40 write the same query up here as we're running in Python and Beanie. There's one index. We have more to go.


Talk Python's Mastodon Michael Kennedy's Mastodon