Building data-driven web apps with Pyramid and SQLAlchemy Transcripts
Chapter: Using SQLAchemy
Lecture: Inserting the actual data
0:00 You saw how to insert data, now let's go through the
0:02 nitty gritty details of actually inserting all the data
0:05 from real sources.
0:07 So there is an API for PyPI
0:11 I've used that API to get most of the top 100
0:15 packages. So for example, here's click.
0:17 Let's look at that real quick.
0:19 And I've downloaded this and it's from
0:21 Armin Ronacher, creator of Flask.
0:23 Here you can see the licenses and the languages in
0:26 this funky format.
0:28 The author info embedded in there.
0:32 You see the license, it's not set here but it's set
0:35 up here, so it's kind of a kind of confusing and
0:39 tricky to make this work, but that's fine.
0:40 So here's a home page
0:42 The maintainers right now, there's no other maintainers set.
0:46 The full summary and then here are a bunch of releases for it, right?
0:50 Each release has comments and so on.
0:53 Now, this data is not super useful to us in this format.
0:59 There's not much description but the other ones...
1:01 Let me just throw one in here so you can see.
1:03 It's sterling, park, parse or an alternative there.
1:06 Here you can see the description has a lot more
1:09 a lot more detail, it goes over to column 1,614
1:13 so that's pretty long.
1:15 Right, so what we're going to do is we're going to import this
1:18 data so you have actual packages, releases, users, etc.
1:22 to work with, how are you're going to do that?
1:24 Well, I'm not going to leave that up to you 'cause that's
1:26 pretty messy but what I am going to do is I'm giving you
1:29 these PyPI top 100 JSON files right here.
1:33 They're going to be just at the top of your repository right there.
1:40 Now, over here we're writing basically the same code.
1:43 Initialize the data base, get a session and what it's going to
1:46 do is it's going to go through and check if and only if there
1:51 are no users, it's going to load those files and it goes
1:54 through and parses out all the users and then
1:58 it actually saves them to the data base and then it goes
2:00 through and finds all the packages, saves them to the
2:02 data base, along with their releases and languages and
2:05 licenses and so on. And it prints out a little summary.
2:09 So right now there's no data and I actually deleted those
2:11 other two releases I put in there.
2:13 So let's run this and I'm actually going to
2:16 run it outside 'Cause it has a cool little progress bar.
2:26 So, let's go over here and say I'm going to activate our
2:30 virtual environment 'cause we need the same packages and
2:33 everything there.
2:34 And then we're going to say "python," copy that path
2:38 and we're going to run that right here.
2:41 Let's see how we do. Beautiful!
2:46 It's got some extra junk that it printed out apparently
2:49 but you could take that out.
2:50 Here's the final numbers it found
2:51 84 users, 96 packages- from those packages we have
2:55 5,400 releases, 10 maintainers, 25 languages and
3:00 30 licenses.
3:02 Cool, now let's just go over here and do a quick query to
3:04 make sure that's in there.
3:06 So if we run that again, now you can see there's tons
3:08 of releases.
3:10 This is all AMPQ, they go in order basically AsyncIO
3:13 view packages there's all the packages.
3:19 Right, here's Tornado and so on.
3:20 These are the popular ones.
3:22 Really nice, so now we have actual data to work with.
3:25 You're welcome to look through this load data.
3:28 It's pretty interesting, how to take these actual real
3:32 world data sets and import them using SQLAlchemy.
3:35 So, we've got all this JSON data now we're putting it in
3:37 our data base, it's a lot of yucky details about making it
3:41 work so I'm not going to go through it.
3:43 But, if it's useful for you feel free to grab it
3:46 we now have data in our database.
3:48 And all the inserts and everything was done
3:50 just the way you saw in the previous video.