MongoDB with Async Python Transcripts
Chapter: PyPI Beanie
Lecture: Concurrency Safe Create Release
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
So we created our create release thing and we actually forgot one thing really quick
0:06
here that we should probably add is last updated equals a daytime.dateTime.now. Import that. Okay. Now, we've got this version and it's pretty good.
0:22
However, it suffers from a few problems that all ORMs or ODMs would. Here's the deal. We're gonna check if there's a package, that's fine.
0:32
But then we've pulled it back here. Imagine there's a lot of concurrency around writes to this document.
0:39
There's not because we just don't release the PyPI like a very, very rapid rate. But let's suppose it was something like that.
0:46
Maybe we're adjusting like the last view time on a page in a CMS and it's just getting pounded, right? There might be concurrency issues there.
0:57
So if we go and we pull the package back, We do some work, do some more work, and then save.
1:04
If somewhere during this period, some other request or some other thing concurrently changes
1:11
package behind the scenes, like they add a release at exactly the same time as this is
1:16
in flight, well, we're going to overwrite the entire document with all the changes. That's not ideal.
1:25
That means one of those pieces of data would be lost. We could use transactions, but it's not necessary. It's just not necessary.
1:31
We just need a different way of thinking about this. The other thing is, how efficient is it to pull the entire document back with all of
1:41
its releases, its potentially a megabyte worth of text in terms of its readme and all those
1:47
things, put this one little thing on there and then push it back down into the database
1:51
and rewriting all of that data, replacing that big, potentially big document.
1:56
ideal, right? So what we can do is we can use a different type of query syntax to say,
2:03
take this release, send it alone to the database, and just stick it on the end from a MongoDB
2:09
internals perspective, don't do all this back and forth. That is done atomically. So would
2:15
be working much better in terms of concurrency, both from a speed and performance, but also
2:19
from a contention and possible data loss. So we're going to use a couple of other types, I'm going to put them up here.
2:29
We're going to use array and we're going to use set and we're going to use increment.
2:37
So let's go down here and I'll have, I think I'll make a copy of this and at the top we'll
2:43
call this, we'll call it fine but full ODM style less efficient. And let's copy, paste, uncomment, format. There we go. Let's do this differently.
2:58
Up at the top, we're going to create the release, period. This thing where we're checking, this is an extra database call we're doing.
3:05
We don't actually need to do that. So instead, what we're going to do is we're going to just try to update the database and
3:11
it will tell us how many things were updated. If none were updated, that means that that was the wrong package.
3:18
And so we can throw this exception again. You know, I'll put this, calm this out for a second.
3:25
So now what we can do is we can go to our package and we can say find one because we want to update a single package.
3:33
And it'll be package.name equals the name of the package. And then on this, I'm going to await this of course, we're going to do an update.
3:45
We want to make two changes, just like we did right here.
3:48
We want to put this release onto package.releases, and we want to set the update, the last updated date to datetime.now. Lost its import there.
4:05
So let's do the release first. So we're going to say array.push, and onto the collection package.releases, we want to
4:14
push the release object. So that's thing one. Oh, whoops, this needs to be like so as a
4:31
dictionary. Okay, so I'm going to push that to this collection, we're going to push this
4:36
object in the database. And we want to do a set. Again, we give it a dictionary, say
4:42
package.lastUpdated, that's this part, to this value here. And now that all happens
4:56
immediately in the database, we don't have to get a thing back or save it, but we do
5:01
want to know about the result. I'll call this updateResult. So here we want to make sure
5:09
that we actually made a change. And remember this test that we did, we go to the database,
5:14
get the package if it's not there, we can test that here as well. So instead of doing
5:18
this, we can use this update result. Now, this update result is actually a PyMongo object,
5:26
say, ""A PyMongo result."" Let's do it like this. We'll type it out. Import that. I guess we've got to import the whole thing back to the top.
5:41
Update result. Now, I'm not so sure how much I appreciate having this huge wrapped thing like
5:54
that, but we'll do it like this. There we go. And now if we type that, you can see it
6:01
has a couple of options or features, a matched result, modified count. And so what we want
6:07
to do is, I think we'll just go with the modified count. If it's not equal to one, I'm going
6:13
to raise the section, maybe it's less than one. I'm going to raise an exception. No package
6:21
with this name, right? We tried to update stuff, nothing was updated. Here's the error.
6:25
So that saves us one kind of useless database call over there. We don't have to check and
6:32
see if it exists. We're just going to go there and try to update it. Most of the time, we
6:36
expect it to succeed. If it doesn't, whatever, we're just going to raise our exception afterwards
6:42
as if we would have before. Again, we're not pushing all pulling and pushing all of that
6:48
data back and forth so we can delete that part there. Same thing here, we're pulling
6:52
that release analytics back, we're making a change, saving it back. This has much less
6:57
of a performance issue because it's such a small document. However, it does still have
7:02
that concurrency issue, theoretically, and there is more contention for this than there
7:07
is for a single package. So we can do the same type of thing. So we will await release
7:13
analytics.find1, then we'll do our update. Now the operation that we're going to apply,
7:19
if we're going to apply the increment, which did I format that out again? I did.
7:25
Apply increment, and we're going to put in a document here. And this is really cool,
7:32
because it's like go to the database and do a plus equals one on that field. And if two things
7:37
are trying to do that concurrently, MongoDB will make sure that they both apply both of those
7:42
increments in the database. So what are we going to increment? Release analytics dot total releases.
7:49
And how much do you want to increment or decrement it by? Like if you're decrement it minus one,
7:54
we're going to increment it by single one. And all of these go away as well. So let's look back at
8:01
it here. So just like before, we created our release, but instead of pulling back the thing,
8:05
checking it exists, changing it in memory and saving it, we're going to send two changes in
8:10
one command to put the release object on the list at the end, and we're going to update the last updated time on the package.
8:20
If that didn't succeed, we're going to raise an exception. If it did, we're going to do a thread save, concurrency save, high performance increment
8:29
of release analytics total releases by one. That's a lot of talking, a lot of thinking about it. Let's try it and make sure we got it right.
8:39
What one do we have open? We got Beanie, so we'll add a 201 to Beanie. So let's create a release, R. Beanie is the name. 2.0.1, this is awesome.
8:57
How many bytes? That's exactly 201 bytes. You know it is. Release URL, don't care. Ah, package name, oh no. Of course, it's not name. ID.
9:14
Try again. So package ID is the name. Release. Gonna get it this time. Beanie. 2.0.1. This is atomic. 2.0.1. No URL. We added 2.0.1.
9:31
Let's go to our database and see what happened. There's a couple of things we should observe. Here we should see that we get 201 pushed on the end
9:40
and way up at the top. We should see that last updated is going to change as well. And over here, remember we had already updated that to 804,
9:53
so it should be, well, it's 812. Let's go back and look and see what the outfit was. So we got our 811.
10:02
We did this thing, and now if we ask for the summary, You should see that number, 812, perfect. So that tells you this one worked.
10:12
Let's go look here and see what we got for Beanie. Oh, look at that. It is now June, so yes, indeed, that worked. Way at the bottom.
10:26
This is Atomic 201. Exactly the same behavior as we had before, but instead of, look how massive this document is. This thing is huge, okay?
10:38
Thousand lines, a lot of those lines are not wrapped for the whole read me and so on. Pulling that back and forth just to make that minor change
10:46
instead of just going push that document into Mongo and tell it to append it here. So much better, plus the concurrency is way better
10:55
in terms of contention and potential data loss from the way ORMs and ODMs work. Excellent, excellent stuff.