Building data-driven web apps with Pyramid and SQLAlchemy Transcripts
Chapter: Using SQLAchemy
Lecture: Performance and lazy or eager loading
0:00 Now, to see what I'd like to show you in this one.
0:02 It's time to finally look at this echo.
0:04 So far it's been false, but let's set it to be true.
0:08 And rerun our code
0:09 and we see a whole bunch of stuff shoot out.
0:11 Look at all that.
0:12 So, what we're doing is actually, SQLAlchemy
0:15 is checking for the existence of various tables
0:17 like downloads, languages, releases and so on.
0:22 And then our app finally starts.
0:23 Alright so this is when we initialize our database stuff.
0:27 Now, let's go over here and load up this page.
0:31 Works great. But notice all the different statements.
0:34 We have over here a begin, select package stuff
0:40 from packages, where package ID is some sort of
0:43 limit and offset. Okay so this is getting
0:46 you can see it's doing the query for
0:49 the first awscli, right.
0:52 So, offset is 1, sorry, offset is 0.
0:54 Limit is one. Okay the thing, that's first. Great.
0:58 Then, we go back to the database again
1:02 and we select release stuff from releases
1:05 where package ID equals some parameter.
1:08 That parameter being the one we passed.
1:10 And we order by these three things in that order.
1:13 Two operations. Not hugely bad here.
1:17 But imagine, you got a list of these packages
1:21 and you looped over those packages and interacted
1:23 with their releases. If you got 20 packages
1:26 you're doing one query to get the 20 packages
1:28 and then you're doing this query separately, 20 times.
1:32 So 21 operations. That's the so-called n+1 problem
1:36 which is a common anti-pattern in terms of performance for ORMs
1:41 This lazy loader's nice, but if you know you're going to
1:44 interact with releases, there's something we can do better.
1:47 So let's turn this into a single query
1:50 that would solve the n+1 problem.
1:51 Great, now we're getting one package so it's not terrible
1:53 like I said, but the same technique applies regards
1:56 of how many you're getting back. Alright, let's have a look.
2:00 So we go over here, and we have this nice, clean bit of code
2:03 on line 42. Give me a query, filter it like this, first boom.
2:07 However, for what we're about to do
2:09 we need a little more space.
2:10 So let's put some wrap in here and here
2:15 and we're going to come over and say we would like to
2:19 go options, and we want to call this thing subqueryload.
2:24 We're going to import that from the ORM.
2:26 And what we do here
2:29 is we're going to put the relationship to be loaded.
2:32 So package.releases. So what are we doing?
2:35 We're telling SQLAlchemy, go do this query
2:38 and anything you return also go ahead and use a subquery
2:41 to pull out all the related releases of this one.
2:45 You could also do a joinedload.
2:47 I think subqueryload's a little newer and fancier
2:49 but they both would accomplish more or less the same thing.
2:53 We have our filter again and first.
2:55 So now, if we run the same thing again
2:57 we should see one, more complicated but only one
3:01 contact to the database. There it is.
3:05 And let's try it with joinedload.
3:09 I think actually joinedload might be better here.
3:11 Try again. Clean that up. There we go
3:18 now we're down to one interaction with the database.
3:20 That's what I wanted after all.
3:21 So, we come over here and we say
3:23 select the package stuff here from
3:26 and then we're doing another, sort of left outer joined
3:30 on releases, to give us that.
3:32 So one interaction with the database.
3:35 It echos out to I think a standard error.
3:38 And then, just standard out.
3:41 But now we've got it down to one database interaction
3:43 and that's better.
3:44 So you can either use subqueryload or joinedload
3:47 and either will work.
3:48 Okay, so, let's look at performance.
3:51 I guess we haven't really talked about performance.
3:53 I'm going to go back and turn off echo.
3:55 'Cause we don't want to trace out a bunch of stuff.
3:59 So let me do this query a bunch of times.
4:01 Now, just to be fair, over here.
4:05 There's not a tremendous amount of data.
4:06 I mean, there are 5000 releases but only 96 projects.
4:09 It turns out, that even with tons of data
4:11 we can get really good performance.
4:13 But let's pull this back up.
4:15 Have a look. So all that stuff happening is 62 milliseconds.
4:21 Pretty decent.
4:22 Let's see if I hit it a few times if I can get it better.
4:28 Eh, 68, it looks like that's pretty much it.
4:31 Check this out, we have our render.
4:33 And we actually have SQLAlchemy.
4:35 Here's that query, holy moly.
4:37 That is quite the query.
4:40 I guess it would be interesting to try the joinedload
4:43 as well and see if we get any performance difference.
4:45 Another thing, just while we have this pulled up here.
4:48 If we go turn on performance and go back to history.
4:52 And then we refresh this a few more times
4:56 you can actually pull up performance
4:58 if I click on the right part here.
5:00 Pull up performance is slower because we're asking it
5:02 to do more work, but look at this.
5:04 Built-in profiling, so.
5:07 It turns out, where does it start to get slow.
5:11 Find package by name.
5:15 Takes about that long, okay.
5:16 Apparently to do that query, it takes a little while.
5:19 Let's go back and just, you know
5:20 since now we can measure
5:22 let's go change a few things here.
5:24 Let's try this subqueryload first of all.
5:35 Do that a few times, pull up the latest.
5:40 Now, this number's slower than the original
5:43 but it actually, that's only because the profiling was on
5:46 let's turn that off. And notice, over here
5:48 we now have two SQLAlchemy interactions
5:51 one for releases and one from packages.
5:54 Let's try again. 20 milliseconds, how 'about that?
6:01 Subquery join is actually, subqueryload is actually faster
6:06 by quite a bit, that's three times faster.
6:08 I guess the final thing to check would just be
6:10 what if it weren't there?
6:12 Let's try it one more time, and just do the relationship.
6:19 And look at that. 16 milliseconds.
6:22 It's a little bit of a interesting case, right?
6:24 That we put that performance speed up in there
6:27 and it actually made it slower.
6:29 It just reminds you, always, always measure.
6:32 So, I'll put a little comment here so you have that, like so.
6:37 Let's put it actually over here, above.
6:41 The reason I think this is faster is
6:45 we're not suffering at all
6:47 from the latency of talking to the database.
6:50 In a real database, that probably would be slower
6:53 because you've got a few milliseconds
6:55 between the interactions.
6:57 And this whole n+1 problem, if you have a hundred
6:59 well obviously the join is going to be faster.
7:01 But I guess just two independent queries
7:03 is faster in this case.
7:04 So, you know, measure, measure, measure.
7:06 See what's up.