Building Data-Driven Web Apps with Flask and SQLAlchemy Transcripts
Chapter: Using SQLAlchemy
Lecture: Inserting real PyPI data
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now you've seen how to insert data with sequel Commie.
0:03
We're going to insert the actual real data and it turns out that this data I
0:08
got is the actual Pipi I data I got from,
0:11
ah, couple of AP eyes I put together and I have the top 100 packages
0:16
in all their details in a bunch of Jason files.
0:19
So what we're gonna do is load those Jason files,
0:21
pull them apart, do some type conversion and things like that and insert them all
0:26
into a database that is super into D gritty and it's really not worth going into
0:30
So let's just skim quickly across that first off to run the program we were
0:34
going to use to new requirements progress bar to so we can have a cool progress
0:39
as we're doing our import, which is really,
0:41
really nice. And then python Dash date util,
0:44
which is a really, really nice way to parse dates much better than the built
0:48
in stuff. So I've already pip installed these,
0:50
so they're just in the requirements now.
0:52
So over here at the top of your repository,
0:55
I have the pie p I top 100 each one of these is just a Jason
0:59
file. For example, let's look at click circa a little while ago written my
1:05
r Monroe knicker originally at least now managed by David Lord and the Pallets Project.
1:10
But for the day that we got,
1:12
this is what it says And it has the licenses BST but notice it doesn't just
1:16
say licenses BST it has this,
1:18
like, sort of funky name space style,
1:20
if you will. It talks about the languages python and being Colin Colin three.
1:24
We're gonna parts that apart. Yeah,
1:25
the license BST. It works on Python to and Python three and so on so
1:29
you can scroll through and see,
1:30
like, here's all of our releases All the details about the releases and the dates
1:34
and Wolf has a lot of stuff,
1:36
right? So we're gonna go and parse that apart and insert it into the database
1:39
and that's gonna happen over here pretty straightforward.
1:42
What we're gonna do is we're going to just go and ask really quickly like,
1:45
Hey, is there any data in this database and the way of checking is are
1:50
there any users? It could look at all the tables and you just ask.
1:53
Are there any users? If so,
1:55
Hey, we've probably already done this,
1:57
so don't reinsert duplicates. Just don't do anything.
2:00
Do a little summary there at the end.
2:02
But if it happens to be empty,
2:04
go load up those files, all of them.
2:06
All the Jason files skin across all of them,
2:09
find the distinct users, import them,
2:11
do all the packages and their releases and so on.
2:14
Pull out the languages licenses like that colon,
2:17
colon BSD license thing we just saw.
2:20
And then finally do a little summary.
2:22
So we're just gonna run this through its Let's just look at the import languages and
2:27
goes and uses our progress bar,
2:29
which is pretty, pretty sweet.
2:30
And it rates over them and pulls out the language classification that the the interesting data
2:35
base part is it says that we're going Teoh,
2:39
just create session creative programming language,
2:41
set the details of it added to the session and call commit and then update our
2:45
progress bar Super straight for right.
2:47
This is what we did before.
2:48
It's just all this gu of juggling the Jason Files.
2:51
All right, so let's go and run this and because it uses the Progress bar
2:54
It looks better outside apply charm.
2:57
It will run in here. No problem.
2:58
But let's just make it as nice as possible.
3:01
So I want to figure out where to activate my virtual environment.
3:03
That's a long enough directory, don't you think?
3:07
I will say that. Slash Activate.
3:11
And then I want to runs a python.
3:13
The name of this script here where that one is gonna do the import.
3:18
One other thing we also need to add system dup half upend toe our path.
3:24
This folder right here because we're importing pipeline,
3:28
not or in pyjamas is gonna totally work smooth,
3:31
because guess what? Pie charm does that forest right there.
3:35
But if we try to run this outside without setting this up,
3:39
it's a package or something like this is not gonna work so great.
3:42
Now her ready to run our code here till gonna run python out of our virtual
3:46
environment Pointed at a low data Here goes.
3:49
So hearing see, it's loading up all of the users.
3:52
All the projects it found 96 packages said there were 100.
3:56
I think for some reason, some couldn't be downloaded.
3:58
So let's go with 96 Top 96.
4:00
Out of there, it went through,
4:01
and it found the users found the packages and releases the languages.
4:05
So in the end we found 84 users.
4:07
96 packages, 5400 releases embedded within those documents,
4:12
10 maintainers, 25 languages and 30 different licenses.
4:16
All right, well, that's it.
4:18
We should now have a whole lot more data over here.
4:20
And if we go local quick,
4:21
let's just go to the packages and jump to the console.
4:25
Say, select star from packages.
4:27
We run it like that. We had a whole bunch of them.
4:29
Here's am Q P. Actors are pars,
4:33
Flake eight and so on. We running for releases,
4:36
you see a whole bunch of stuff and there related to their various packages over here
4:40
on the right. Pretty awesome,
4:41
huh? So now when we run our app,
4:43
forgive over and run the actual app itself click on it,
4:49
you can see Well, we're not quite using the data yet,
4:52
but we're going to be able to start using all that data we've just loaded up
4:55
and dropping it into these locations.
4:57
So that's gonna be really awesome.
4:59
We have true, accurate, realistic or even a snapshot in time.
5:04
RealD data from Pipi I toe work with to finish building out in testing your app