MongoDB for Developers with Python Transcripts
Chapter: Modeling and document design
Lecture: A real world example

Login or purchase this course to watch this video and the rest of the course contents.
0:01 So let's look inside the application that you're using right now to take this course as an example.
0:07 So at the time of this recording, here's what the Talk Python training website database looks like for courses and users.
0:15 So, first let's focus on the course side of things, there's a couple of interesting ideas here,
0:20 one, we have an id which is not an object id, why is it not an object id, well, it was actually migrated from a relational database initially,
0:28 this was using SQLAlchemy, and it was easier to keep this id here as a number rather than switch to MongoDB's object id,
0:37 it's also easier to refer to it in other areas, like say in the commerce system I can put the id in without using,
0:43 I don't have very much space in terms of the message, that can go into the e commerce system based on their api,
0:50 so one is much easier than like 32 characters, so we're using the non standard id which is generated in the app
0:56 but for these types of things, that is really no big deal, for the users, I think we might be using object ids.
1:02 We have somewhat sort of flat things here, we have the url and the title and when it was published, things like that,
1:08 so this is the Learn Python by Building Ten Apps Jumpstart Course and you can see a lot of the initial ideas here,
1:14 and the initial pieces of data are totally straightforward and they would look exactly the same in a relational database.
1:20 However, there's two things that are very different than I want to pull your attention to;
1:25 first is not actually the embedded stuff, but is this duration in seconds, when I created the MongoDB version of this web app,
1:32 I realized one of the things I do all the time on the home page, on the course listing page, and many many places,
1:40 is I say how long is the course, this course is 6.5 hours, I think this one is 7.1 hours or something to that effect.
1:47 Using quick math you can figure out duration in second. So there was actually a pretty serious bottleneck
1:52 where I'd have to go and in this case pull back 12 chapters and then from the chapters I could get the lectures
1:59 and from the lectures I could get how long each individual one was, I had that all up and then I could print out that number.
2:06 And then I would do that for say like on the course catalog page, there was like ten courses, I would have to go through so many of these chapters
2:14 and then their subsequent lectures, and that was a huge huge bottleneck. So what I decided to do was in the application,
2:19 any time I save or update the course, I'm going to compute this on save which is extremely rare, and then I'm going to stash this here,
2:27 so this is actually computed from the chapters which are computed from the lectures themselves,
2:32 and this is data duplication, but you'll find that a little bit of data duplication,
2:37 I find usually most apps is like one or two little pieces like this that just unlock a lot of performance
2:43 because actually computing this turns out to be really really computationally expensive, but storing it here on this object made it super fast.
2:51 So this is one thing, this data duplication which I try to stay away from as much as I can but the trade-off here was so worth it.
2:58 Now, the other part we want to focus on is down here, we said I'd like to associate these chapter ids with a particular course,
3:04 now if this was a relational database, I might have a course to chapter normalization table, right, it'd have the course id and the chapter id
3:12 and I do some query some kind of join on that; you almost never ever, ever see that in MongoDB and document databases.
3:20 Usually, at least the ids are embedded on one side of that, one to many relationships so here we have the course, the course has some chapters,
3:28 so we're just storing the ids here. Now, we also have the chapters, you can see chapter 1001 goes right here
3:36 and this one is a little bit more interesting, we've got again our duration in seconds
3:41 which is another thing computed from if you look at the individual lectures they've got duration in seconds, and that's the real raw number.
3:49 So this is another duplication, because at many, many levels I need to show the time of a chapter,
3:54 and that was turning out to be computationally expensive at many levels, so again, these two places, this is the one bit of duplicated data
4:01 and you will see that this is more common in a document database than in a relational one.
4:06 So here we've got our chapter which has this soft relationship from the course over to the id, we also have the course id down there and below it,
4:13 so it's kind of this bidirectional relationship; then we have lectures, and lectures is interested in that
4:19 almost every time that we get a hold of a chapter we care about its lectures, we usually want to display them in a list
4:28 any time that I get a lecture, this is the thing like you're watching right now, this is the lecture, right, an individual video let's say,
4:33 any time you have one of those, you almost always need the other ones,
4:37 at least the ones before and after it, so like if you look in this particular player
4:41 you'll see there is a forward and a backward within the course button that you can skip ahead or skip back, that is the other lectures
4:49 so what I find is grouping the chapter along with the lectures into one blob that makes it super fast and I almost always want the other lectures
4:58 when I have one lecture, and if I have the lecture, I usually need to display the chapter title, and things like that.
5:03 Anyway, so these are really well suited to be put together in this embedded style, so I don't have a lectures table, I have course, courses
5:10 and I have chapters, and then in the chapters those are embedding the lectures, and we also saw that little bit of data duplication.
5:16 So you can see down here is an individual embedded lecture, here's one that talks about doing the exercises
5:21 in this course and it's apparently 202 seconds, so I hope this look behind the scenes has helped you understand
5:29 how you might model this stuff, you can look at the course page and the player and think about some of the trade-offs,
5:34 I don't know that this is perfect, but it is absolutely working well for the web app. Let's look at one more thing.
5:40 Down here we have the users, and we have a couple of items that we're going to focus on when we get to the users,
5:45 I have blurred some out, we're using object id now for the user id I covered the password and things like that,
5:50 but we've got some flat stuff like whether or not you're opting out of email, what your user name is, what your email address is, things like that.
5:56 And then, I have this concept of an origin, so if you come from like some particular marketing source
6:02 it might record like hey this person created their account and they originally came from Facebook,
6:07 this person originally came from the podcast or something like that, so that's pretty interesting, we also have the courses that you are taking,
6:12 so right here, this particular person, this is me, so I gave myself basically all the courses, these are the ids of the courses that I am a student in,
6:21 so again, there's not a users, there's not a courses in a user courses sort of normalization thing is very common that when I as a user
6:30 am loaded into the database, I very often need to know about the courses. Now I can't easily embed the course into the user, right,
6:37 that'd be like insane levels of duplication, but closest thing I can do is I can get this list and then I can go back and do another queer
6:43 say give me all the courses where the course id is in this list of owned courses, so basically two queries I have everything I need.
6:50 We also have the bundle id and some other things going on here. So that embedded course id, that's actually a list
6:56 one more thing to look at down here is this preferences, so this is short name, somewhat short name, this is the preferences for your player
7:06 so when you're in the video player, you can choose different qualities, you can turn on captions or you can turn off captions,
7:13 subtitles, transcripts basically and you can choose a playback speed, it could be like .75 up to two or three or something crazy like this.
7:20 One of the primary actions a user does on this site is to go through the course, each course might have 150 lectures
7:29 so as a user, you come in you look round a little bit and then you go through 150 lectures,
7:34 so this preferences thing needs to be pulled back frequently. And so we got to get the user anyway and embedding them together means
7:40 it's basically instant access any time I'm in the player to figure out how to preconfigure the player to render your video the way that you like it.
7:49 So this is an embedded item, but not an embedded list just an embedded preference object. So there you have it, a look inside Talk Python Training
7:58 at least as it was when we recorded this, so hopefully this helps you think through some of the challenges of building a more realistic app.


Talk Python's Mastodon Michael Kennedy's Mastodon