MongoDB for Developers with Python Transcripts
Chapter: High-performance MongoDB
Lecture: Concept: Document design for performance
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
One of the most important things you can do for performance
0:04
in your database and these document databases
0:06
is think about your document design,
0:08
should you embed stuff, should you not, what embeds where,
0:11
do you embed just ids, do you embed the whole thing;
0:14
all of these are really important questions
0:16
and it takes a little bit of experience to know what the right thing to do is.
0:20
It also really depends on your application's use case,
0:24
so something that's really obviously a thing we should consider
0:28
is this service history thing, this adds the most weight to these car objects,
0:34
so we've got this embedded document list field
0:38
so how often do we need these histories?
0:44
How many histories might a car have?
0:46
Should those maybe be in a separate collection
0:49
where it has all the stuff that service record, the class has,
0:52
plus car id, or something to that effect?
0:56
So this is a really important question,
0:59
and it really depends on how we're using this car object, this car document
1:05
if almost all the time we want to work with the service history,
1:07
it's probably good to go in and put it here,
1:10
unless these can be really large or something to that effect,
1:13
but if you don't need them often, you'll consider putting them in their own collection,
1:16
there's just a tension between complexity and separation,
1:20
safety and separation, speed of having them in separate
1:24
so you don't pull them back all the time;
1:26
you can also consider using the only keyword or only operator in MongoEngine
1:30
to say if I don't need it, exclude the service history,
1:34
it adds a little bit of complexity because you often know,
1:38
hey is this the car that came with service history
1:40
or is it a car where that was excluded, things like that,
1:42
but you could use performance profiling and tuning
1:45
to figure out where you might use only.
1:48
Let's look at one more thing around document design.
1:50
You want to consider the size of the document,
1:52
remember MongoDB has a limit on how large these documents can be,
1:56
that's 16 MB per record, that doesn't mean you should think
2:01
oh it's only 10 MB so everything is fine for my document design,
2:05
that might be terrible this is like a hard upper bound,
2:07
like the database stops working after it hits 16 MB,
2:11
so you really want to think about what is the right size,
2:14
so let's look at a couple examples:
2:16
we can go to any collection and say .stats
2:18
and it will talk about the size of the documents and things like that,
2:21
so here we ran db.cars.stats in MongoEngine,
2:25
and we see that the average object size is about 700 bytes,
2:29
there is information about how many there are, and all that kind of stuff,
2:33
but really the most interesting thing for this discussion is
2:35
what is the average object size, 700 bytes
2:38
that seems like a pretty good size to me, it's not huge by any means,
2:42
and this is the cars that contain those service histories,
2:45
so this is probably fine for what we're doing.
2:48
Let me give you a more realistic example.
2:50
Let's think about the Talk Python Training website,
2:52
and the courses and chapters, we talked about them before,
2:56
so here if we run that same thing, db.courses.stats
3:02
you can see that the average object size is 900 bytes for a course,
3:07
and remember the course has the description that shows on the page
3:10
and that's probably most the size, it has a few other things as well,
3:13
like student testimonials and whatnot,
3:16
but basically it's the description and a few hyperlinks.
3:19
So I think this is again a totally good object, average object size.
3:23
Now one of the considerations was I could have taken the chapters
3:27
which themselves contain all the lectures,
3:29
and embedded those within the course,
3:32
would that have been a good idea—
3:34
I think I might have even had it created that way
3:36
in the very beginning, and it was a lot slower than I was hoping for,
3:38
so I redesigned the documents.
3:40
If we run this on this chapter section, you can see
3:43
that the average object size is 2.3 KB,
3:46
this is starting to get a little bit big, on its own it's fine,
3:50
but think about the fact that a course on average has like 10 to 20 chapters,
3:55
so if I embedded the chapters in the course
3:58
instead of putting them to a separate document like I do,
4:02
this is how it actually runs at the time of the recording,
4:04
then it would be something like these courses would be
4:07
24 up to maybe 50 KB of data per entry,
4:12
think about that you go to like the courses page
4:15
and it shows you a big list of all the courses
4:17
and there might be 10 or later 20 courses,
4:20
we're pulling back and deserializing like megabytes of data
4:24
to render a really, really common page, that is probably not ok,
4:28
so this is why I did not embed the chapters and lectures inside the course,
4:34
I just said okay, this is the breaking point
4:37
I looked at the objects' size I looked at where the performance was
4:41
and I said you know what, really it's not that common
4:44
that we actually want more than one chapter at a time,
4:46
but it is common we want lectures, so it's probably the right partitioning,
4:51
but you build it one way, you try it, it doesn't work,
4:53
you just redesign your class structure, recreate the database and try it again,
4:57
but you do want to think about the average object size
5:00
and you can do it super easy with db.colection name.stats.