Building Data-Driven Web Apps with Pyramid and SQLAlchemy Transcripts
Chapter: Using SQLAchemy
Lecture: Performance and lazy or eager loading
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now, to see what I'd like to show you in this one.
0:02
It's time to finally look at this echo.
0:04
So far it's been false, but let's set it to be true.
0:08
And rerun our code
0:09
and we see a whole bunch of stuff shoot out.
0:11
Look at all that.
0:12
So, what we're doing is actually, SQLAlchemy
0:15
is checking for the existence of various tables
0:17
like downloads, languages, releases and so on.
0:22
And then our app finally starts.
0:23
Alright so this is when we initialize our database stuff.
0:27
Now, let's go over here and load up this page.
0:31
Works great. But notice all the different statements.
0:34
We have over here a begin, select package stuff
0:40
from packages, where package ID is some sort of
0:43
limit and offset. Okay so this is getting
0:46
you can see it's doing the query for
0:49
the first awscli, right.
0:52
So, offset is 1, sorry, offset is 0.
0:54
Limit is one. Okay the thing, that's first. Great.
0:58
Then, we go back to the database again
1:02
and we select release stuff from releases
1:05
where package ID equals some parameter.
1:08
That parameter being the one we passed.
1:10
And we order by these three things in that order.
1:13
Two operations. Not hugely bad here.
1:17
But imagine, you got a list of these packages
1:21
and you looped over those packages and interacted
1:23
with their releases. If you got 20 packages
1:26
you're doing one query to get the 20 packages
1:28
and then you're doing this query separately, 20 times.
1:32
So 21 operations. That's the so-called n+1 problem
1:36
which is a common anti-pattern in terms of performance for ORMs
1:41
This lazy loader's nice, but if you know you're going to
1:44
interact with releases, there's something we can do better.
1:47
So let's turn this into a single query
1:50
that would solve the n+1 problem.
1:51
Great, now we're getting one package so it's not terrible
1:53
like I said, but the same technique applies regards
1:56
of how many you're getting back. Alright, let's have a look.
2:00
So we go over here, and we have this nice, clean bit of code
2:03
on line 42. Give me a query, filter it like this, first boom.
2:07
However, for what we're about to do
2:09
we need a little more space.
2:10
So let's put some wrap in here and here
2:15
and we're going to come over and say we would like to
2:19
go options, and we want to call this thing subqueryload.
2:24
We're going to import that from the ORM.
2:26
And what we do here
2:29
is we're going to put the relationship to be loaded.
2:32
So package.releases. So what are we doing?
2:35
We're telling SQLAlchemy, go do this query
2:38
and anything you return also go ahead and use a subquery
2:41
to pull out all the related releases of this one.
2:45
You could also do a joinedload.
2:47
I think subqueryload's a little newer and fancier
2:49
but they both would accomplish more or less the same thing.
2:53
We have our filter again and first.
2:55
So now, if we run the same thing again
2:57
we should see one, more complicated but only one
3:01
contact to the database. There it is.
3:05
And let's try it with joinedload.
3:09
I think actually joinedload might be better here.
3:11
Try again. Clean that up. There we go
3:18
now we're down to one interaction with the database.
3:20
That's what I wanted after all.
3:21
So, we come over here and we say
3:23
select the package stuff here from
3:26
and then we're doing another, sort of left outer joined
3:30
on releases, to give us that.
3:32
So one interaction with the database.
3:35
It echos out to I think a standard error.
3:38
And then, just standard out.
3:41
But now we've got it down to one database interaction
3:43
and that's better.
3:44
So you can either use subqueryload or joinedload
3:47
and either will work.
3:48
Okay, so, let's look at performance.
3:51
I guess we haven't really talked about performance.
3:53
I'm going to go back and turn off echo.
3:55
'Cause we don't want to trace out a bunch of stuff.
3:59
So let me do this query a bunch of times.
4:01
Now, just to be fair, over here.
4:05
There's not a tremendous amount of data.
4:06
I mean, there are 5000 releases but only 96 projects.
4:09
It turns out, that even with tons of data
4:11
we can get really good performance.
4:13
But let's pull this back up.
4:15
Have a look. So all that stuff happening is 62 milliseconds.
4:21
Pretty decent.
4:22
Let's see if I hit it a few times if I can get it better.
4:28
Eh, 68, it looks like that's pretty much it.
4:31
Check this out, we have our render.
4:33
And we actually have SQLAlchemy.
4:35
Here's that query, holy moly.
4:37
That is quite the query.
4:40
I guess it would be interesting to try the joinedload
4:43
as well and see if we get any performance difference.
4:45
Another thing, just while we have this pulled up here.
4:48
If we go turn on performance and go back to history.
4:52
And then we refresh this a few more times
4:56
you can actually pull up performance
4:58
if I click on the right part here.
5:00
Pull up performance is slower because we're asking it
5:02
to do more work, but look at this.
5:04
Built-in profiling, so.
5:07
It turns out, where does it start to get slow.
5:11
Find package by name.
5:15
Takes about that long, okay.
5:16
Apparently to do that query, it takes a little while.
5:19
Let's go back and just, you know
5:20
since now we can measure
5:22
let's go change a few things here.
5:24
Let's try this subqueryload first of all.
5:35
Do that a few times, pull up the latest.
5:40
Now, this number's slower than the original
5:43
but it actually, that's only because the profiling was on
5:46
let's turn that off. And notice, over here
5:48
we now have two SQLAlchemy interactions
5:51
one for releases and one from packages.
5:54
Let's try again. 20 milliseconds, how 'about that?
6:01
Subquery join is actually, subqueryload is actually faster
6:06
by quite a bit, that's three times faster.
6:08
I guess the final thing to check would just be
6:10
what if it weren't there?
6:12
Let's try it one more time, and just do the relationship.
6:19
And look at that. 16 milliseconds.
6:22
It's a little bit of a interesting case, right?
6:24
That we put that performance speed up in there
6:27
and it actually made it slower.
6:29
It just reminds you, always, always measure.
6:32
So, I'll put a little comment here so you have that, like so.
6:37
Let's put it actually over here, above.
6:41
The reason I think this is faster is
6:45
we're not suffering at all
6:47
from the latency of talking to the database.
6:50
In a real database, that probably would be slower
6:53
because you've got a few milliseconds
6:55
between the interactions.
6:57
And this whole n+1 problem, if you have a hundred
6:59
well obviously the join is going to be faster.
7:01
But I guess just two independent queries
7:03
is faster in this case.
7:04
So, you know, measure, measure, measure.
7:06
See what's up.