Building Data-Driven Web Apps with Pyramid and SQLAlchemy Transcripts
Chapter: Using SQLAchemy
Lecture: Performance and lazy or eager loading

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now, to see what I'd like to show you in this one. It's time to finally look at this echo. So far it's been false, but let's set it to be true.
0:09 And rerun our code and we see a whole bunch of stuff shoot out. Look at all that. So, what we're doing is actually, SQLAlchemy
0:16 is checking for the existence of various tables like downloads, languages, releases and so on. And then our app finally starts.
0:24 Alright so this is when we initialize our database stuff. Now, let's go over here and load up this page.
0:32 Works great. But notice all the different statements. We have over here a begin, select package stuff from packages, where package ID is some sort of
0:44 limit and offset. Okay so this is getting you can see it's doing the query for the first awscli, right. So, offset is 1, sorry, offset is 0.
0:55 Limit is one. Okay the thing, that's first. Great. Then, we go back to the database again and we select release stuff from releases
1:06 where package ID equals some parameter. That parameter being the one we passed. And we order by these three things in that order.
1:14 Two operations. Not hugely bad here. But imagine, you got a list of these packages and you looped over those packages and interacted
1:24 with their releases. If you got 20 packages you're doing one query to get the 20 packages and then you're doing this query separately, 20 times.
1:33 So 21 operations. That's the so-called n+1 problem which is a common anti-pattern in terms of performance for ORMs
1:42 This lazy loader's nice, but if you know you're going to interact with releases, there's something we can do better.
1:48 So let's turn this into a single query that would solve the n+1 problem. Great, now we're getting one package so it's not terrible
1:54 like I said, but the same technique applies regards of how many you're getting back. Alright, let's have a look.
2:01 So we go over here, and we have this nice, clean bit of code on line 42. Give me a query, filter it like this, first boom.
2:08 However, for what we're about to do we need a little more space. So let's put some wrap in here and here
2:16 and we're going to come over and say we would like to go options, and we want to call this thing subqueryload. We're going to import that from the ORM.
2:27 And what we do here is we're going to put the relationship to be loaded. So package.releases. So what are we doing?
2:36 We're telling SQLAlchemy, go do this query and anything you return also go ahead and use a subquery to pull out all the related releases of this one.
2:46 You could also do a joinedload. I think subqueryload's a little newer and fancier but they both would accomplish more or less the same thing.
2:54 We have our filter again and first. So now, if we run the same thing again we should see one, more complicated but only one
3:02 contact to the database. There it is. And let's try it with joinedload. I think actually joinedload might be better here.
3:12 Try again. Clean that up. There we go now we're down to one interaction with the database. That's what I wanted after all.
3:22 So, we come over here and we say select the package stuff here from and then we're doing another, sort of left outer joined
3:31 on releases, to give us that. So one interaction with the database. It echos out to I think a standard error. And then, just standard out.
3:42 But now we've got it down to one database interaction and that's better. So you can either use subqueryload or joinedload and either will work.
3:49 Okay, so, let's look at performance. I guess we haven't really talked about performance. I'm going to go back and turn off echo.
3:56 'Cause we don't want to trace out a bunch of stuff. So let me do this query a bunch of times. Now, just to be fair, over here.
4:06 There's not a tremendous amount of data. I mean, there are 5000 releases but only 96 projects. It turns out, that even with tons of data
4:12 we can get really good performance. But let's pull this back up. Have a look. So all that stuff happening is 62 milliseconds. Pretty decent.
4:23 Let's see if I hit it a few times if I can get it better. Eh, 68, it looks like that's pretty much it. Check this out, we have our render.
4:34 And we actually have SQLAlchemy. Here's that query, holy moly. That is quite the query. I guess it would be interesting to try the joinedload
4:44 as well and see if we get any performance difference. Another thing, just while we have this pulled up here.
4:49 If we go turn on performance and go back to history. And then we refresh this a few more times you can actually pull up performance
4:59 if I click on the right part here. Pull up performance is slower because we're asking it to do more work, but look at this. Built-in profiling, so.
5:08 It turns out, where does it start to get slow. Find package by name. Takes about that long, okay. Apparently to do that query, it takes a little while.
5:20 Let's go back and just, you know since now we can measure let's go change a few things here. Let's try this subqueryload first of all.
5:36 Do that a few times, pull up the latest. Now, this number's slower than the original but it actually, that's only because the profiling was on
5:47 let's turn that off. And notice, over here we now have two SQLAlchemy interactions one for releases and one from packages.
5:55 Let's try again. 20 milliseconds, how 'about that? Subquery join is actually, subqueryload is actually faster
6:07 by quite a bit, that's three times faster. I guess the final thing to check would just be what if it weren't there?
6:13 Let's try it one more time, and just do the relationship. And look at that. 16 milliseconds. It's a little bit of a interesting case, right?
6:25 That we put that performance speed up in there and it actually made it slower. It just reminds you, always, always measure.
6:33 So, I'll put a little comment here so you have that, like so. Let's put it actually over here, above. The reason I think this is faster is
6:46 we're not suffering at all from the latency of talking to the database. In a real database, that probably would be slower
6:54 because you've got a few milliseconds between the interactions. And this whole n+1 problem, if you have a hundred
7:00 well obviously the join is going to be faster. But I guess just two independent queries is faster in this case.
7:05 So, you know, measure, measure, measure. See what's up.


Talk Python's Mastodon Michael Kennedy's Mastodon