MongoDB with Async Python Transcripts
Chapter: PyPI Beanie
Lecture: Concurrency Safe Create Release

Login or purchase this course to watch this video and the rest of the course contents.
0:00 So we created our create release thing and we actually forgot one thing really quick
0:06 here that we should probably add is last updated equals a daytime.dateTime.now. Import that. Okay. Now, we've got this version and it's pretty good.
0:22 However, it suffers from a few problems that all ORMs or ODMs would. Here's the deal. We're gonna check if there's a package, that's fine.
0:32 But then we've pulled it back here. Imagine there's a lot of concurrency around writes to this document.
0:39 There's not because we just don't release the PyPI like a very, very rapid rate. But let's suppose it was something like that.
0:46 Maybe we're adjusting like the last view time on a page in a CMS and it's just getting pounded, right? There might be concurrency issues there.
0:57 So if we go and we pull the package back, We do some work, do some more work, and then save.
1:04 If somewhere during this period, some other request or some other thing concurrently changes
1:11 package behind the scenes, like they add a release at exactly the same time as this is
1:16 in flight, well, we're going to overwrite the entire document with all the changes. That's not ideal.
1:25 That means one of those pieces of data would be lost. We could use transactions, but it's not necessary. It's just not necessary.
1:31 We just need a different way of thinking about this. The other thing is, how efficient is it to pull the entire document back with all of
1:41 its releases, its potentially a megabyte worth of text in terms of its readme and all those
1:47 things, put this one little thing on there and then push it back down into the database
1:51 and rewriting all of that data, replacing that big, potentially big document.
1:56 ideal, right? So what we can do is we can use a different type of query syntax to say,
2:03 take this release, send it alone to the database, and just stick it on the end from a MongoDB
2:09 internals perspective, don't do all this back and forth. That is done atomically. So would
2:15 be working much better in terms of concurrency, both from a speed and performance, but also
2:19 from a contention and possible data loss. So we're going to use a couple of other types, I'm going to put them up here.
2:29 We're going to use array and we're going to use set and we're going to use increment.
2:37 So let's go down here and I'll have, I think I'll make a copy of this and at the top we'll
2:43 call this, we'll call it fine but full ODM style less efficient. And let's copy, paste, uncomment, format. There we go. Let's do this differently.
2:58 Up at the top, we're going to create the release, period. This thing where we're checking, this is an extra database call we're doing.
3:05 We don't actually need to do that. So instead, what we're going to do is we're going to just try to update the database and
3:11 it will tell us how many things were updated. If none were updated, that means that that was the wrong package.
3:18 And so we can throw this exception again. You know, I'll put this, calm this out for a second.
3:25 So now what we can do is we can go to our package and we can say find one because we want to update a single package.
3:33 And it'll be package.name equals the name of the package. And then on this, I'm going to await this of course, we're going to do an update.
3:45 We want to make two changes, just like we did right here.
3:48 We want to put this release onto package.releases, and we want to set the update, the last updated date to datetime.now. Lost its import there.
4:05 So let's do the release first. So we're going to say array.push, and onto the collection package.releases, we want to
4:14 push the release object. So that's thing one. Oh, whoops, this needs to be like so as a
4:31 dictionary. Okay, so I'm going to push that to this collection, we're going to push this
4:36 object in the database. And we want to do a set. Again, we give it a dictionary, say
4:42 package.lastUpdated, that's this part, to this value here. And now that all happens
4:56 immediately in the database, we don't have to get a thing back or save it, but we do
5:01 want to know about the result. I'll call this updateResult. So here we want to make sure
5:09 that we actually made a change. And remember this test that we did, we go to the database,
5:14 get the package if it's not there, we can test that here as well. So instead of doing
5:18 this, we can use this update result. Now, this update result is actually a PyMongo object,
5:26 say, ""A PyMongo result."" Let's do it like this. We'll type it out. Import that. I guess we've got to import the whole thing back to the top.
5:41 Update result. Now, I'm not so sure how much I appreciate having this huge wrapped thing like
5:54 that, but we'll do it like this. There we go. And now if we type that, you can see it
6:01 has a couple of options or features, a matched result, modified count. And so what we want
6:07 to do is, I think we'll just go with the modified count. If it's not equal to one, I'm going
6:13 to raise the section, maybe it's less than one. I'm going to raise an exception. No package
6:21 with this name, right? We tried to update stuff, nothing was updated. Here's the error.
6:25 So that saves us one kind of useless database call over there. We don't have to check and
6:32 see if it exists. We're just going to go there and try to update it. Most of the time, we
6:36 expect it to succeed. If it doesn't, whatever, we're just going to raise our exception afterwards
6:42 as if we would have before. Again, we're not pushing all pulling and pushing all of that
6:48 data back and forth so we can delete that part there. Same thing here, we're pulling
6:52 that release analytics back, we're making a change, saving it back. This has much less
6:57 of a performance issue because it's such a small document. However, it does still have
7:02 that concurrency issue, theoretically, and there is more contention for this than there
7:07 is for a single package. So we can do the same type of thing. So we will await release
7:13 analytics.find1, then we'll do our update. Now the operation that we're going to apply,
7:19 if we're going to apply the increment, which did I format that out again? I did.
7:25 Apply increment, and we're going to put in a document here. And this is really cool,
7:32 because it's like go to the database and do a plus equals one on that field. And if two things
7:37 are trying to do that concurrently, MongoDB will make sure that they both apply both of those
7:42 increments in the database. So what are we going to increment? Release analytics dot total releases.
7:49 And how much do you want to increment or decrement it by? Like if you're decrement it minus one,
7:54 we're going to increment it by single one. And all of these go away as well. So let's look back at
8:01 it here. So just like before, we created our release, but instead of pulling back the thing,
8:05 checking it exists, changing it in memory and saving it, we're going to send two changes in
8:10 one command to put the release object on the list at the end, and we're going to update the last updated time on the package.
8:20 If that didn't succeed, we're going to raise an exception. If it did, we're going to do a thread save, concurrency save, high performance increment
8:29 of release analytics total releases by one. That's a lot of talking, a lot of thinking about it. Let's try it and make sure we got it right.
8:39 What one do we have open? We got Beanie, so we'll add a 201 to Beanie. So let's create a release, R. Beanie is the name. 2.0.1, this is awesome.
8:57 How many bytes? That's exactly 201 bytes. You know it is. Release URL, don't care. Ah, package name, oh no. Of course, it's not name. ID.
9:14 Try again. So package ID is the name. Release. Gonna get it this time. Beanie. 2.0.1. This is atomic. 2.0.1. No URL. We added 2.0.1.
9:31 Let's go to our database and see what happened. There's a couple of things we should observe. Here we should see that we get 201 pushed on the end
9:40 and way up at the top. We should see that last updated is going to change as well. And over here, remember we had already updated that to 804,
9:53 so it should be, well, it's 812. Let's go back and look and see what the outfit was. So we got our 811.
10:02 We did this thing, and now if we ask for the summary, You should see that number, 812, perfect. So that tells you this one worked.
10:12 Let's go look here and see what we got for Beanie. Oh, look at that. It is now June, so yes, indeed, that worked. Way at the bottom.
10:26 This is Atomic 201. Exactly the same behavior as we had before, but instead of, look how massive this document is. This thing is huge, okay?
10:38 Thousand lines, a lot of those lines are not wrapped for the whole read me and so on. Pulling that back and forth just to make that minor change
10:46 instead of just going push that document into Mongo and tell it to append it here. So much better, plus the concurrency is way better
10:55 in terms of contention and potential data loss from the way ORMs and ODMs work. Excellent, excellent stuff.


Talk Python's Mastodon Michael Kennedy's Mastodon