#100DaysOfCode in Python Transcripts
Chapter: Days 52-54: Parsing RSS feeds with Feedparser
Lecture: Parsing XML with Feedparser
0:00 Okay, we have our xml feed downloaded
0:03 and saved, and now we're going to
0:05 actually parse it with feedparser.
0:08 So, what we need to do is import feedparser.
0:13 Okay, and that's half the work done.
0:16 No, I'm just kidding.
0:18 What we want to do now is we need to actually
0:22 tell feedparser what file it's going to be passing.
0:25 Now, you could actually use the url, okay.
0:29 You could use the url from, directly,
0:32 but the problem with that is,
0:34 is that it requires the internet connection,
0:36 and if it can't get that, it's going to fail.
0:39 So, it's actually better from and application standpoint
0:43 to download the parse, the feedparser, sorry the feed,
0:48 and then parse it using a separate script.
0:50 That's why we've done it in two different scripts.
0:53 We could have done it in one just by
0:55 telling feedparser to point directly to the url,
0:58 but we're going to do it parsing a local file.
1:01 Okay, so feedfile
1:06 is a newreleases.xml.
1:08 So, the same file that we just downloaded and saved.
1:12 Okay, now what do we want to do?
1:14 We want to actually parse that file, alright?
1:17 And that's an argument for feedparser.
1:21 So, we're going to parse it
1:23 and store it in the feed variable.
1:26 Okay, so feedparser.parse.
1:29 Now, it's as the name implies, right?
1:32 It's going to just parse over the file,
1:34 and store all of those contents inside the feed variable.
1:42 Now, if we bring up our xml file.
1:46 Where have you gone?
1:47 Let's open it here, open it again in internet explorer.
1:52 Don't hate me, let's make it a little bit bigger.
1:55 Okay, so what we notice here is that,
2:00 we can see it's staggered out, it's xml.
2:01 We've all seen that before, but it's pretty much,
2:04 once it's loaded into feed,
2:06 it becomes a sort of dictionary, okay?
2:09 With your title, your link, your description,
2:13 and these are the sort of keys that we want to pull out.
2:16 Okay, and by doing that,
2:19 so to do that we actually use these tags.
2:23 Okay, and that's what feedparser does.
2:25 It parses, and let's you pull data based on these tags.
2:28 Alright, so you'll see that.
2:30 Here's what we'll do, so this is the...
2:33 what I think we should try getting is the title.
2:37 So, we want the title of the feed, the feed entry.
2:42 So, Midweek Madness, Dragon's Dogma, blah blah blah.
2:45 That's what we want to pull.
2:47 We also want the published date.
2:50 Okay, now most xml feeds,
2:53 or most rss feeds should have this.
2:57 Some don't, but we'll get to that in a minute.
2:59 So, we want the publication date, alright?
3:01 So we know what date.
3:02 And then I think we should get the link as well,
3:04 because we would like to get the url for this deal,
3:08 or this new game launch, or whatever it is.
3:11 Alright, so we go back to our file here,
3:14 to our script, and we will go.
3:17 We're going to use a for loop
3:18 to parse over this data, okay?
3:21 So, we're going to go for entry in feed,
3:25 that's this, in feed.entries.
3:30 Okay, that's why I've said for entry.
3:32 So, for every entry within all of the entries within feed.
3:37 What are we going to do? Well, we're going to print something.
3:40 So, this is just for the sake of this script.
3:42 So, we're going to print entry.published,
3:46 so even though I will point this out,
3:49 this one got me at the start.
3:51 Even though it says pub date here,
3:55 that is not actually what feedparser gets.
3:58 That's actually called published
3:59 from the feedparser standpoint.
4:01 So, entry.published.
4:03 Just going to use some standard string stuff here,
4:07 so bare with me, then we're going to choose the title.
4:10 So, imagine this as you're writing it.
4:12 The date, and then the title, alright?
4:16 So, for entry.title, throw in a colon there,
4:21 and what's next?
4:24 So, all we're doing is we're printing out the date,
4:29 the title, and then the link for that, and that's it.
4:34 Okay, nice and simple, and you won't believe me,
4:38 but that's it.
4:41 Okay, so it's pretty simple.
4:44 So, we'll go python parser.py,
4:49 and that's it.
4:50 Let's maximize this, so look at that.
4:53 We have the date, Friday the 5th of January, 2018.
4:57 At that time, watch live on Steam,
4:59 Paladins World Championships, and then,
5:02 we have the url.
5:04 How simple is that?
5:06 We've parsed this entire feed,
5:11 ignored all the extra stuff in there that doesn't matter,
5:15 and we've taken just these titles,
5:19 with the date, and the url.
5:22 Very, very...