Building Data-Driven Web Apps with Pyramid and SQLAlchemy Transcripts
Chapter: Using SQLAchemy
Lecture: Inserting the actual data
0:00 You saw how to insert data, now let's go through the nitty gritty details of actually inserting all the data from real sources.
0:08 So there is an API for PyPI I've used that API to get most of the top 100 packages. So for example, here's click. Let's look at that real quick.
0:20 And I've downloaded this and it's from Armin Ronacher, creator of Flask. Here you can see the licenses and the languages in this funky format.
0:29 The author info embedded in there. You see the license, it's not set here but it's set up here, so it's kind of a kind of confusing and
0:40 tricky to make this work, but that's fine. So here's a home page The maintainers right now, there's no other maintainers set.
0:47 The full summary and then here are a bunch of releases for it, right? Each release has comments and so on.
0:54 Now, this data is not super useful to us in this format. There's not much description but the other ones...
1:02 Let me just throw one in here so you can see. It's sterling, park, parse or an alternative there. Here you can see the description has a lot more
1:10 a lot more detail, it goes over to column 1,614 so that's pretty long. Right, so what we're going to do is we're going to import this
1:19 data so you have actual packages, releases, users, etc. to work with, how are you're going to do that?
1:25 Well, I'm not going to leave that up to you 'cause that's pretty messy but what I am going to do is I'm giving you
1:30 these PyPI top 100 JSON files right here. They're going to be just at the top of your repository right there.
1:41 Now, over here we're writing basically the same code. Initialize the data base, get a session and what it's going to
1:47 do is it's going to go through and check if and only if there are no users, it's going to load those files and it goes
1:55 through and parses out all the users and then it actually saves them to the data base and then it goes
2:01 through and finds all the packages, saves them to the data base, along with their releases and languages and
2:06 licenses and so on. And it prints out a little summary. So right now there's no data and I actually deleted those other two releases I put in there.
2:14 So let's run this and I'm actually going to run it outside 'Cause it has a cool little progress bar.
2:27 So, let's go over here and say I'm going to activate our virtual environment 'cause we need the same packages and everything there.
2:35 And then we're going to say "Python," copy that path and we're going to run that right here. Let's see how we do. Beautiful!
2:47 It's got some extra junk that it printed out apparently but you could take that out. Here's the final numbers it found
2:52 84 users, 96 packages- from those packages we have 5,400 releases, 10 maintainers, 25 languages and 30 licenses.
3:03 Cool, now let's just go over here and do a quick query to make sure that's in there. So if we run that again, now you can see there's tons of releases.
3:11 This is all AMPQ, they go in order basically AsyncIO view packages there's all the packages. Right, here's Tornado and so on.
3:21 These are the popular ones. Really nice, so now we have actual data to work with. You're welcome to look through this load data.
3:29 It's pretty interesting, how to take these actual real world data sets and import them using SQLAlchemy.
3:36 So, we've got all this JSON data now we're putting it in our data base, it's a lot of yucky details about making it
3:42 work so I'm not going to go through it. But, if it's useful for you feel free to grab it we now have data in our database.
3:49 And all the inserts and everything was done just the way you saw in the previous video.