Building Data-Driven Web Apps with Flask and SQLAlchemy Transcripts
Chapter: Using SQLAlchemy
Lecture: Inserting real PyPI data
0:00 Now you've seen how to insert data with sequel Commie.
0:03 We're going to insert the actual real data and it turns out that this data I
0:08 got is the actual Pipi I data I got from,
0:11 ah, couple of AP eyes I put together and I have the top 100 packages
0:16 in all their details in a bunch of Jason files.
0:19 So what we're gonna do is load those Jason files,
0:21 pull them apart, do some type conversion and things like that and insert them all
0:26 into a database that is super into D gritty and it's really not worth going into
0:30 So let's just skim quickly across that first off to run the program we were
0:34 going to use to new requirements progress bar to so we can have a cool progress
0:39 as we're doing our import, which is really,
0:41 really nice. And then python Dash date util,
0:44 which is a really, really nice way to parse dates much better than the built
0:48 in stuff. So I've already pip installed these,
0:50 so they're just in the requirements now.
0:52 So over here at the top of your repository,
0:55 I have the pie p I top 100 each one of these is just a Jason
0:59 file. For example, let's look at click circa a little while ago written my
1:05 r Monroe knicker originally at least now managed by David Lord and the Pallets Project.
1:10 But for the day that we got,
1:12 this is what it says And it has the licenses BST but notice it doesn't just
1:16 say licenses BST it has this,
1:18 like, sort of funky name space style,
1:20 if you will. It talks about the languages python and being Colin Colin three.
1:24 We're gonna parts that apart. Yeah,
1:25 the license BST. It works on Python to and Python three and so on so
1:29 you can scroll through and see,
1:30 like, here's all of our releases All the details about the releases and the dates
1:34 and Wolf has a lot of stuff,
1:36 right? So we're gonna go and parse that apart and insert it into the database
1:39 and that's gonna happen over here pretty straightforward.
1:42 What we're gonna do is we're going to just go and ask really quickly like,
1:45 Hey, is there any data in this database and the way of checking is are
1:50 there any users? It could look at all the tables and you just ask.
1:53 Are there any users? If so,
1:55 Hey, we've probably already done this,
1:57 so don't reinsert duplicates. Just don't do anything.
2:00 Do a little summary there at the end.
2:02 But if it happens to be empty,
2:04 go load up those files, all of them.
2:06 All the Jason files skin across all of them,
2:09 find the distinct users, import them,
2:11 do all the packages and their releases and so on.
2:14 Pull out the languages licenses like that colon,
2:17 colon BSD license thing we just saw.
2:20 And then finally do a little summary.
2:22 So we're just gonna run this through its Let's just look at the import languages and
2:27 goes and uses our progress bar,
2:29 which is pretty, pretty sweet.
2:30 And it rates over them and pulls out the language classification that the the interesting data
2:35 base part is it says that we're going Teoh,
2:39 just create session creative programming language,
2:41 set the details of it added to the session and call commit and then update our
2:45 progress bar Super straight for right.
2:47 This is what we did before.
2:48 It's just all this gu of juggling the Jason Files.
2:51 All right, so let's go and run this and because it uses the Progress bar
2:54 It looks better outside apply charm.
2:57 It will run in here. No problem.
2:58 But let's just make it as nice as possible.
3:01 So I want to figure out where to activate my virtual environment.
3:03 That's a long enough directory, don't you think?
3:07 I will say that. Slash Activate.
3:11 And then I want to runs a python.
3:13 The name of this script here where that one is gonna do the import.
3:18 One other thing we also need to add system dup half upend toe our path.
3:24 This folder right here because we're importing pipeline,
3:28 not or in pyjamas is gonna totally work smooth,
3:31 because guess what? Pie charm does that forest right there.
3:35 But if we try to run this outside without setting this up,
3:39 it's a package or something like this is not gonna work so great.
3:42 Now her ready to run our code here till gonna run python out of our virtual
3:46 environment Pointed at a low data Here goes.
3:49 So hearing see, it's loading up all of the users.
3:52 All the projects it found 96 packages said there were 100.
3:56 I think for some reason, some couldn't be downloaded.
3:58 So let's go with 96 Top 96.
4:00 Out of there, it went through,
4:01 and it found the users found the packages and releases the languages.
4:05 So in the end we found 84 users.
4:07 96 packages, 5400 releases embedded within those documents,
4:12 10 maintainers, 25 languages and 30 different licenses.
4:16 All right, well, that's it.
4:18 We should now have a whole lot more data over here.
4:20 And if we go local quick,
4:21 let's just go to the packages and jump to the console.
4:25 Say, select star from packages.
4:27 We run it like that. We had a whole bunch of them.
4:29 Here's am Q P. Actors are pars,
4:33 Flake eight and so on. We running for releases,
4:36 you see a whole bunch of stuff and there related to their various packages over here
4:40 on the right. Pretty awesome,
4:41 huh? So now when we run our app,
4:43 forgive over and run the actual app itself click on it,
4:49 you can see Well, we're not quite using the data yet,
4:52 but we're going to be able to start using all that data we've just loaded up
4:55 and dropping it into these locations.
4:57 So that's gonna be really awesome.
4:59 We have true, accurate, realistic or even a snapshot in time.
5:04 RealD data from Pipi I toe work with to finish building out in testing your app