MongoDB for Developers with Python Transcripts
Chapter: Modeling and document design
Lecture: A real world example
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
So let's look inside the application that you're using right now to take this course as an example.
0:07
So at the time of this recording, here's what the Talk Python training website database looks like for courses and users.
0:15
So, first let's focus on the course side of things, there's a couple of interesting ideas here,
0:20
one, we have an id which is not an object id, why is it not an object id, well, it was actually migrated from a relational database initially,
0:28
this was using SQLAlchemy, and it was easier to keep this id here as a number rather than switch to MongoDB's object id,
0:37
it's also easier to refer to it in other areas, like say in the commerce system I can put the id in without using,
0:43
I don't have very much space in terms of the message, that can go into the e commerce system based on their api,
0:50
so one is much easier than like 32 characters, so we're using the non standard id which is generated in the app
0:56
but for these types of things, that is really no big deal, for the users, I think we might be using object ids.
1:02
We have somewhat sort of flat things here, we have the url and the title and when it was published, things like that,
1:08
so this is the Learn Python by Building Ten Apps Jumpstart Course and you can see a lot of the initial ideas here,
1:14
and the initial pieces of data are totally straightforward and they would look exactly the same in a relational database.
1:20
However, there's two things that are very different than I want to pull your attention to;
1:25
first is not actually the embedded stuff, but is this duration in seconds, when I created the MongoDB version of this web app,
1:32
I realized one of the things I do all the time on the home page, on the course listing page, and many many places,
1:40
is I say how long is the course, this course is 6.5 hours, I think this one is 7.1 hours or something to that effect.
1:47
Using quick math you can figure out duration in second. So there was actually a pretty serious bottleneck
1:52
where I'd have to go and in this case pull back 12 chapters and then from the chapters I could get the lectures
1:59
and from the lectures I could get how long each individual one was, I had that all up and then I could print out that number.
2:06
And then I would do that for say like on the course catalog page, there was like ten courses, I would have to go through so many of these chapters
2:14
and then their subsequent lectures, and that was a huge huge bottleneck. So what I decided to do was in the application,
2:19
any time I save or update the course, I'm going to compute this on save which is extremely rare, and then I'm going to stash this here,
2:27
so this is actually computed from the chapters which are computed from the lectures themselves,
2:32
and this is data duplication, but you'll find that a little bit of data duplication,
2:37
I find usually most apps is like one or two little pieces like this that just unlock a lot of performance
2:43
because actually computing this turns out to be really really computationally expensive, but storing it here on this object made it super fast.
2:51
So this is one thing, this data duplication which I try to stay away from as much as I can but the trade-off here was so worth it.
2:58
Now, the other part we want to focus on is down here, we said I'd like to associate these chapter ids with a particular course,
3:04
now if this was a relational database, I might have a course to chapter normalization table, right, it'd have the course id and the chapter id
3:12
and I do some query some kind of join on that; you almost never ever, ever see that in MongoDB and document databases.
3:20
Usually, at least the ids are embedded on one side of that, one to many relationships so here we have the course, the course has some chapters,
3:28
so we're just storing the ids here. Now, we also have the chapters, you can see chapter 1001 goes right here
3:36
and this one is a little bit more interesting, we've got again our duration in seconds
3:41
which is another thing computed from if you look at the individual lectures they've got duration in seconds, and that's the real raw number.
3:49
So this is another duplication, because at many, many levels I need to show the time of a chapter,
3:54
and that was turning out to be computationally expensive at many levels, so again, these two places, this is the one bit of duplicated data
4:01
and you will see that this is more common in a document database than in a relational one.
4:06
So here we've got our chapter which has this soft relationship from the course over to the id, we also have the course id down there and below it,
4:13
so it's kind of this bidirectional relationship; then we have lectures, and lectures is interested in that
4:19
almost every time that we get a hold of a chapter we care about its lectures, we usually want to display them in a list
4:28
any time that I get a lecture, this is the thing like you're watching right now, this is the lecture, right, an individual video let's say,
4:33
any time you have one of those, you almost always need the other ones,
4:37
at least the ones before and after it, so like if you look in this particular player
4:41
you'll see there is a forward and a backward within the course button that you can skip ahead or skip back, that is the other lectures
4:49
so what I find is grouping the chapter along with the lectures into one blob that makes it super fast and I almost always want the other lectures
4:58
when I have one lecture, and if I have the lecture, I usually need to display the chapter title, and things like that.
5:03
Anyway, so these are really well suited to be put together in this embedded style, so I don't have a lectures table, I have course, courses
5:10
and I have chapters, and then in the chapters those are embedding the lectures, and we also saw that little bit of data duplication.
5:16
So you can see down here is an individual embedded lecture, here's one that talks about doing the exercises
5:21
in this course and it's apparently 202 seconds, so I hope this look behind the scenes has helped you understand
5:29
how you might model this stuff, you can look at the course page and the player and think about some of the trade-offs,
5:34
I don't know that this is perfect, but it is absolutely working well for the web app. Let's look at one more thing.
5:40
Down here we have the users, and we have a couple of items that we're going to focus on when we get to the users,
5:45
I have blurred some out, we're using object id now for the user id I covered the password and things like that,
5:50
but we've got some flat stuff like whether or not you're opting out of email, what your user name is, what your email address is, things like that.
5:56
And then, I have this concept of an origin, so if you come from like some particular marketing source
6:02
it might record like hey this person created their account and they originally came from Facebook,
6:07
this person originally came from the podcast or something like that, so that's pretty interesting, we also have the courses that you are taking,
6:12
so right here, this particular person, this is me, so I gave myself basically all the courses, these are the ids of the courses that I am a student in,
6:21
so again, there's not a users, there's not a courses in a user courses sort of normalization thing is very common that when I as a user
6:30
am loaded into the database, I very often need to know about the courses. Now I can't easily embed the course into the user, right,
6:37
that'd be like insane levels of duplication, but closest thing I can do is I can get this list and then I can go back and do another queer
6:43
say give me all the courses where the course id is in this list of owned courses, so basically two queries I have everything I need.
6:50
We also have the bundle id and some other things going on here. So that embedded course id, that's actually a list
6:56
one more thing to look at down here is this preferences, so this is short name, somewhat short name, this is the preferences for your player
7:06
so when you're in the video player, you can choose different qualities, you can turn on captions or you can turn off captions,
7:13
subtitles, transcripts basically and you can choose a playback speed, it could be like .75 up to two or three or something crazy like this.
7:20
One of the primary actions a user does on this site is to go through the course, each course might have 150 lectures
7:29
so as a user, you come in you look round a little bit and then you go through 150 lectures,
7:34
so this preferences thing needs to be pulled back frequently. And so we got to get the user anyway and embedding them together means
7:40
it's basically instant access any time I'm in the player to figure out how to preconfigure the player to render your video the way that you like it.
7:49
So this is an embedded item, but not an embedded list just an embedded preference object. So there you have it, a look inside Talk Python Training
7:58
at least as it was when we recorded this, so hopefully this helps you think through some of the challenges of building a more realistic app.