MongoDB with Async Python Transcripts
Chapter: Modeling with Documents
Lecture: Modeling: Relational vs. Documents
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's begin with a contrasting example from our PyPI data itself. So how might we model data with these document databases?
0:11
Well, how would we model it with a relational database? Here's one option. So in our PyPI data, we have things like packages.
0:18
These are Beanie, FastAPI, PyMongo, the stuff that's on PyPI that you pip install. Those actually have multiple releases for their different versions.
0:30
And those, once those versions are published, they don't change.
0:34
So we're going to have this relationship from package to its multiple releases, a one to many relationship there.
0:42
And then we have users who can be either just regular users, or they can publish and maintain a package.
0:49
Now, in order to understand what package is published by which user, there are multiple
0:56
users that can manage or publish a package. And there are users who can have multiple packages
1:03
like SQL model and FastAPI for Sebastian Ramirez. So in order to model that this is a many to many
1:11
relationship, we need this relationship table called maintainer that has a composite primary
1:19
key of package ID and user ID. And this sets up that many to many relationship for us. This is what we would do in something like Postgres or SQLite.
1:29
And this is basically the third normal form for this data. If we look over at the Mongo side of things, usually what you will see is there are fewer.
1:40
Oftentimes, as you get those little relationships and those many-to-many relationships, oftentimes you'll find much, much simpler data models
1:49
over on the MongoDB side. So here's how this might look in MongoDB for the same data that we just discussed. We would have our user object
1:57
and that's basically standalone. And the other really important thing is the package, right? Here's Beanie and it's license and who maintains it.
2:07
But notice the maintainers is a list of object IDs. So instead of having that many to many relationship table,
2:16
we can just choose either the package or the users and we can associate that many to many relationship here. So for example, suppose you have user one,
2:26
we could have in our package, we could have user one and user 20, but we could also have multiple packages where user one appears in that list.
2:35
That's that many to many relationship. More significantly though, not just combining those many to many tables is the releases.
2:46
So we had a separate table for releases, Maybe, just maybe, this is a big maybe, it makes sense to embed the release objects inside of the package.
2:56
So anytime we get the package, we just already have its releases. So what are some of the differences and similarities?
3:03
Sometimes we have these relationships, right? This is not a embedded user, it's the ID of the user. And so that's really a relationship
3:12
that we're embedding in there. Other times we might actually embed objects
3:18
in like the releases. So in here we have multiple objects and inside that dot dot dot, that's
3:23
all the details that would have been columns in that releases table.