MongoDB with Async Python Transcripts
Chapter: Performance Tuning
Lecture: Projections for Packages
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
A quick note, I just switched this to the match statement that I was using in this example
0:06
here, rather than the if else just so you have it exactly the same, nothing, nothing too much going on there, but just a minor update.
0:13
Okay, so we saw that some things are fast. For example, when we search the database, that was really, really fast.
0:23
They're getting timed packages 300 milliseconds. And what we did is we got a FastAPI, all of its details, its description, which is
0:34
the whole readme, as well of it, it's 154 releases. And one thing you might say is, well, you designed your documents poorly.
0:43
Here's a scenario if we go see where it's getting used. Where we get the package back, and we just show the ID and the last updated and we don't
0:53
necessarily have to show the releases. we're just going to do like this, like that. It's still not going to change the time, about
1:02
300 milliseconds still, because regardless of whether we're using the description, regardless
1:07
of whether we're using the 154 releases, we're still pulling them back over and over and
1:14
over again. Not ideal. So what can we do? We can do a projection. We talked about projections
1:22
when we talked about the MongoDB query syntax in the native shell. But what about Beanie? What do we do here?
1:30
We go back to Pydantic, and we express a smaller class that we would like to project into, which is a pretty neat way to do it.
1:39
So what we're going to do is we're going to go here and we have our regular package, which derives from Beanie.document.
1:46
But down here or in a separate file, we can say a class will have package top level only is what I'm going to call this.
1:53
You call it whatever makes you happy. It's going to be a Pydantic.baseModel. And then you just go up here and you cherry pick.
2:02
You're like, ""All right, well, the ID is important. The updated date, not last updated, but summary. I'll just copy those and we can throw them away.
2:11
We don't need the defaults because we're not creating them. They're going to come out of the database, but we also don't need this.
2:18
Let's say those are the three things that we need. It's not quite enough though, we got to pass a little bit of extra information to say how
2:26
that projection is actually done from Mongo into these things because this could be called and created date if you are a monster.
2:37
So in here we're going to have a settings class as well, an inner class.
2:42
And instead of having things like what collection does it go to, we're going to talk about the projection.
2:47
So we're going to say we're going to want the ID and that's dollar underscore ID. This is the Mongo query syntax there.
2:55
We want the summary, which is summary. And we want let's just say we want the last updated, I guess. Like that.
3:06
Because that's the one we're using recall up here, we're not talking about when it was created, but when it was last updated.
3:13
Okay, so we want ID last updated and summary. And this has way, way less data. Recall over in a package.
3:27
It's the main amount of data is all these releases, right? For FastAPI, there's 154. That's a lot.
3:33
We're not getting any of that, as well as the description itself, which is that read me the other huge piece of data. So we're missing all that.
3:41
What happens if we now go and change this get timed package, which means package by name, and let's add a keyword argument, summary only.
3:52
And in this case, we're going to set it to be true and we're going to have PyCharm add the summary only on there.
4:02
And if we go to the definition, now you can see summary only is true, but we really want this to be false by default.
4:09
We're just going to use in that one case. There's a couple things we can do here.
4:13
We could write the query and expand on it or we could just do two different things like
4:18
your most natural instinct might be if not summary summary only return this else what what I'll get the key right.
4:37
What goes here? something. Let's write it that way real quick. And I'll show you a cool alternative. So onto
4:43
this, we can say dot project. And all we have to give it is that projection model. So what
4:49
was it was packaged top level, but pycharm do all this magic to import it. So let's run
4:57
this again. So let's see if we remember, here's the time to get FastAPI 309 before. Look
5:10
at that much faster, three times faster, or if you call it much faster, but it's definitely
5:16
an improvement. Let's look again. 83. Oh, so that's almost four times faster 3.7 times
5:26
faster. So that's way less stress that we're putting on to MongoDB itself. There's a lot
5:33
of less data on the network, there's less disk access, potentially, if you have a ton
5:38
of data, all of these things. And all we had to do is say, we're going to project into
5:44
this set here. And it works because we weren't making any changes. Now if I go back and reset
5:51
this real quick. We run it again to have the packages back. No releases. We don't want
6:01
to pull those back and we can't leave the code the same. We had to make a little bit
6:05
of a trade-off there, right? I think it's fair. We're like, ""All right, we don't really
6:10
need to see how many things were. What we're actually interested in is that."" Just keep Keep in mind you only have the data comes back here.
6:18
Maybe one final thing in this. We said we're getting an optional package. Should probably say or a package top level only. Right.
6:31
That should be a one or the other. You might be able to convince me to use none right here instead of optional.
6:37
But you're going to get this or possibly this. So you want to be careful now that we're we're talking about what comes back accurately in
6:44
in terms of the typing, not a big deal, but just keep that in mind. Finally, this is the naive way,
6:50
and it's fine if like this is the code you're writing, it's super simple. If you had a complicated query, something like this,
6:58
you probably don't want that. You probably wanna be able to reuse as much of that as possible. So watch this, if we go over here
7:05
and we create a variable called query like that, we're doing the whole query and either we're executing it directly here or we can then
7:20
apply on further things like to list, list and etc. It doesn't actually apply right there, but whatever additional things you would chain
7:31
on including potentially other filter queries, other filter aspects, you can just keep piling those on before you await it.
7:41
If you want to make sure you have a single copy of the query and sometimes you're going
7:46
to not project it and other times you will, this allows you to have one and only one definition to maintain. That may or may not be worth it.
7:55
Like I said, here it's questionable. Down here it's probably a good idea. Okay. Excellent. Let's just make sure it still works.
8:03
Sure enough, we found FastAPI with the same last updated date, still the same performance.
8:10
Let's go switch that back one more time just to see what the meaning is the effect is. Here we go. Still back to 300.
8:21
So roughly three to four times faster by doing that projection.
8:25
It's also worth noting that what we're doing is we're exchanging data with a local loopback MongoDB server.
8:33
If we were talking to a production version, probably MongoDB would be somewhere across
8:38
the network. So having extra data or less data go across the network will matter more.
8:45
And if you're doing some kind of distributed thing, or you're talking far away to some
8:49
cloud service, that that's where MongoDB lives, it's only going to be more true. So this is
8:55
this dev scenario, this has the least effect that it probably would in production, or some
9:01
other production like scenario, this would probably have a bigger effect still because the network would get involved in that.