MongoDB for Developers with Python Transcripts
Chapter: High-performance MongoDB
Lecture: Running the default configuration
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
Let's go ahead and run this code, you've seen the minor changes
0:04
like the addition of this concept of an owner,
0:06
and how we generated all this data, and how you can restore it.
0:09
Let's go ahead and run it, and see what's happening.
0:13
Let's look at this from two perspectives, let's begin over actually in Robomongo,
0:17
so we're going to ask the question, basically how many owners own a certain car
0:21
the idea is more or less we're going to call this function which goes right here,
0:25
really what we're looking for is this query,
0:28
find me all of the owners where this car id is in their car ids collection,
0:33
just generate and deserialize that.
0:37
The other one that we're going to focus on is
0:39
show me the cars with the expensive service history,
0:42
how many cars or what cars had some kind of service
0:46
that cost over 16800 dollars.
0:49
Let's begin by looking at those in Robomongo.
0:54
Here we have this concept, we could simplify this a little bit, but it doesn't matter,
0:57
cars here's the service history, let's go to the price
1:00
where that's greater than 16800, how many of them are there.
1:05
If I run this, notice, it took a while to come back,
1:08
run it again, here's the speed right there, 0.724 sec, 0.731, 0.733,
1:14
so it's pretty reliably taking around 700 milliseconds to answer that question.
1:19
We're going to come back to this.
1:22
Here's a more interesting example, like go and randomly grab a car
1:25
somewhere deep in the list, in this case I put 61600,
1:30
grab that car and then find me all the owners,
1:33
where that car id appears in their id list, and then we'll just dump that out,
1:38
by saying var it doesn't appear if you just state the name it will show up down here,
1:43
so make sure to deselect it and run this,
1:45
and this is actually surprisingly fast, given all the stuff that's going on here,
1:48
but it's taking still about 75, 80 milliseconds to run here,
1:53
which, I don't know, maybe in your database
1:55
going across a 100 thousand records 80 milliseconds seems decent,
1:59
I can tell you in MongoDB 80 milliseconds is terrible
2:02
you should really think about making something that's 80 milliseconds faster
2:06
it's not always possible you can do it,
2:08
but most of the queries as we'll see are possible.
2:11
Let's take this one and just try to understand what's happening here
2:16
and then we're going to go look at it in Python,
2:19
but let's just explore it here in the shell for just a moment.
2:21
Why is this taking 700 milliseconds?
2:24
MongoDB has this way to basically ask how are you running this query,
2:29
and the way you do that is you say explain, like so,
2:35
so I can say this query instead of giving me a result tell me how you're running it,
2:38
if I unselect it, it just runs the selected stuff if there's something there,
2:42
so we can go and look at it in this mode,
2:44
so it says okay, here's what the query planner found for you,
2:47
we've parsed this query, and this is something
2:50
it's basically what went into the find,
2:52
it also might have something to the effect of like a sword
2:55
and other things that are happening, but this is a simple query.
2:58
Look down here, see this winning plan, stage column scan,
3:02
that is bad, that is really, really bad.
3:05
Also notice the rejected plan, so if there are multiple indexes
3:08
and other things that could have done
3:10
it might have attempted a bunch of them and said no, no, no this is the best,
3:13
let's see it doesn't seem to tell us any more about what it did there,
3:18
like sometimes it'll tell you how many records it scanned and things like this,
3:21
but it's just basically reading entirely in the forward direction
3:25
over this and just doing a comparison.
3:27
So that's why this was taking 700 milliseconds
3:32
as it was literally reading and comparing 100 thousand entries
3:36
or actually more, remember their is 1.2 million search histories
3:40
across those 250 thousand cars, so not 100 thousand,
3:43
1.2 million records it scanned over, that's bad, you don't want that.
3:47
So what we can do is we can actually add an index,
3:51
now there's two ways to add an index,
3:54
but before I add the index, let's go over here
3:58
just explain is super, super valuable,
4:00
any time something is slow we're going to explain
4:03
there's actually way to turn on profiling and say log all of the queries
4:07
that you see MongoDB that are slower than x,
4:11
you providing them like say 10 milliseconds might be great,
4:14
show me all the queries that take more than 10 milliseconds
4:17
and then you can drop them in here, put an explain
4:19
and then start creating indexes to make them faster.
4:22
So just google mongodb profile enable slow queries
4:26
or something like this, it's pretty straightforward.
4:29
Now let's run this code, we're asking a lot of questions
4:31
what we want to run is q and a, so we go over here and just right click and say run,
4:37
notice some of these things are taking time,
4:42
the database might be cold, it might have not loaded that stuff,
4:46
so let me run it one more time just to be fair,
4:49
there's a few things that are already really fast, and that's cool,
4:55
so let's go here and review, how many owners are there—
4:58
well, I can tell you it doesn't show the answer
5:01
it just sort of says this is the question I'm asking here is how long it takes.
5:04
Three milliseconds, that is solid, how many cars— half a millisecond.
5:07
That's pretty solid, I don't think we can improve the count on the entire collection
5:11
but this one, find the 10 thousandth owner— not good,
5:14
so let's see how many cars are owned by that person—
5:19
this is pretty fast actually, this is surprisingly fast,
5:23
how many owners this can have— 66 milliseconds
5:26
that's the one we were looking at in there.
5:29
I'm going to take these numbers and put them over here,
5:32
let's say, this will be Without indexes
5:36
we're going to get this, we don't really care about the exit code, do we?
5:41
With indexes, and we're going to kind of iterate on this a little bit
5:45
so let's begin over here, and we're going to talk about
5:49
how we can add an index in MongoDB and then for the most part
5:55
do this in MongoEngine because it's really part of the way our application works,
6:00
what the indexes are, and it's better to make that part of our document
6:03
then kind of do a separate database setup step;
6:07
we could create a script in Javascript and run it,
6:09
it will do these things and that may be fine, but let's go over here and work on this.
6:14
Again we had the count, here's the almost 800 milliseconds,
6:19
let's go over here and just I'll take this, I'll make a copy,
6:28
so here is what we can do, instead of doing the find operation
6:31
we can say create index,
6:35
and then we have the thing that we're doing the query on,
6:38
most the time this is one item but you can have composite indexes
6:43
they are a little more nuance so we'll talk about them later,
6:45
but let's just do this one, we want to be able to query by service history's price
6:52
Here we can put one of two things, one or minus one,
6:56
what do you want the default sort, descending or ascending?
6:59
A lot of times it doesn't really matter,
7:01
it can read from the back or it can read from the front, whatever,
7:04
you saw the forward direction on our column scan for example.
7:06
So over here we could say one, this creates an index, there's no count;
7:09
the other thing we can do is we can give it a name
7:13
so we can come over here and say name is search by service history price,
7:24
so if we go look in this little indexes, we'll see the name here,
7:27
we can also say run in the background,
7:30
if I don't say that it's going to block the database until the index is generated,
7:33
if you're doing this in production, and you have tons and tons of data
7:36
maybe background is the way to go.
7:38
Okay, anyway let's go ahead and run this and see what happens.
7:41
Notice the pause, this is it's actually computing the index
7:44
right now the database is effectively down, now it's back,
7:47
what do we get ok, we created collection automatically know it already existed
7:51
a number of indexes before was one, now we have two
7:54
and everything was a ok so if I refresh,
7:59
here's that index and I can actually edit this over here in Robomongo,
8:05
go for the advanced properties, here is the create index and background
8:09
whether it's sparse, how long it lives,
8:11
whether it's based on text search or whatever, but here's just the basic thing.
8:18
We've added this index, remember this took 800 milliseconds
8:21
ask the same question now, boom, 8 milliseconds.
8:24
Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2,
8:28
right, the screen sharing is probably put in a pretty heavy load on the server
8:32
that's also the database server, right but still,
8:35
we're getting it down 350, 400 times faster by adding that.
8:39
Now if I go back and I ask that question explain
8:42
now we get something way better, winning plan is index scan
8:50
index name search by service history price, that is really awesome;
8:57
that means we're using our index which is so much faster.
9:02
There was no rejected plans, so it only found one index
9:06
it tried to use it if found that it was awesome, it's very happy.
9:16
Go back to my account more time,
9:21
boom 2 milliseconds, and that's a really good answer,
9:24
let's go run our Python code and see what answers we get now,
9:27
that was already faster, let's go over here
9:32
and load car name and ids with expensive prices and spark plugs,
9:38
20 milliseconds this is actually a pretty complicated query
9:43
we'll get into cars with expensive service, 1.9 milliseconds.
9:47
This is exactly what we saw in Robomongo,
9:51
so over here in MongoEngine, we're getting essentially the same results— how cool is that?
9:56
Very nice, we're going to go through and in Python from now on
10:02
we're going to add the necessary index to start making these
10:05
almost all of these run super fast, all of them run fast
10:09
some of them we can get incredibly fast, like one millisecond,
10:11
others not quite that fast, but we'll still do good on all of them.