#100DaysOfCode in Python Transcripts
Chapter: Days 52-54: Parsing RSS feeds with Feedparser
Lecture: Parsing XML with Feedparser
0:00 Okay, we have our xml feed downloaded and saved, and now we're going to actually parse it with feedparser. So, what we need to do is import feedparser.
0:14 Okay, and that's half the work done. No, I'm just kidding. What we want to do now is we need to actually
0:23 tell feedparser what file it's going to be passing. Now, you could actually use the url, okay. You could use the url from, directly,
0:33 but the problem with that is, is that it requires the internet connection, and if it can't get that, it's going to fail.
0:40 So, it's actually better from and application standpoint to download the parse, the feedparser, sorry the feed,
0:49 and then parse it using a separate script. That's why we've done it in two different scripts. We could have done it in one just by
0:56 telling feedparser to point directly to the url, but we're going to do it parsing a local file. Okay, so feedfile is a newreleases.xml.
1:09 So, the same file that we just downloaded and saved. Okay, now what do we want to do? We want to actually parse that file, alright?
1:18 And that's an argument for feedparser. So, we're going to parse it and store it in the feed variable. Okay, so feedparser.parse.
1:30 Now, it's as the name implies, right? It's going to just parse over the file, and store all of those contents inside the feed variable. Okay?
1:43 Now, if we bring up our xml file. Where have you gone? Let's open it here, open it again in internet explorer.
1:53 Don't hate me, let's make it a little bit bigger. Okay, so what we notice here is that, we can see it's staggered out, it's xml.
2:02 We've all seen that before, but it's pretty much, once it's loaded into feed, it becomes a sort of dictionary, okay?
2:10 With your title, your link, your description, and these are the sort of keys that we want to pull out. Okay, and by doing that,
2:20 so to do that we actually use these tags. Okay, and that's what feedparser does. It parses, and let's you pull data based on these tags.
2:29 Alright, so you'll see that. Here's what we'll do, so this is the... what I think we should try getting is the title.
2:38 So, we want the title of the feed, the feed entry. So, Midweek Madness, Dragon's Dogma, blah blah blah. That's what we want to pull.
2:48 We also want the published date. Okay, now most xml feeds, or most rss feeds should have this. Some don't, but we'll get to that in a minute.
3:00 So, we want the publication date, alright? So we know what date. And then I think we should get the link as well,
3:05 because we would like to get the url for this deal, or this new game launch, or whatever it is. Alright, so we go back to our file here,
3:15 to our script, and we will go. We're going to use a for loop to parse over this data, okay? So, we're going to go for entry in feed,
3:26 that's this, in feed.entries. Okay, that's why I've said for entry. So, for every entry within all of the entries within feed.
3:38 What are we going to do? Well, we're going to print something. So, this is just for the sake of this script. So, we're going to print entry.published,
3:47 so even though I will point this out, this one got me at the start. Even though it says pub date here, that is not actually what feedparser gets.
3:59 That's actually called published from the feedparser standpoint. So, entry.published. Just going to use some standard string stuff here,
4:08 so bare with me, then we're going to choose the title. So, imagine this as you're writing it. The date, and then the title, alright?
4:17 So, for entry.title, throw in a colon there, and what's next? entry.link So, all we're doing is we're printing out the date,
4:30 the title, and then the link for that, and that's it. Okay, nice and simple, and you won't believe me, but that's it. Okay, so it's pretty simple.
4:45 So, we'll go Python parser.py, and that's it. Let's maximize this, so look at that. We have the date, Friday the 5th of January, 2018.
4:58 At that time, watch live on Steam, Paladins World Championships, and then, we have the url. How simple is that? We've parsed this entire feed,
5:12 ignored all the extra stuff in there that doesn't matter, and we've taken just these titles, with the date, and the url. Very, very...