#100DaysOfCode in Python Transcripts
Chapter: Days 52-54: Parsing RSS feeds with Feedparser
Lecture: Pulling the feed with Requests
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Okay, we have our xml feed downloaded and saved, and now we're going to actually parse it with feedparser. So, what we need to do is import feedparser.
0:14
Okay, and that's half the work done. No, I'm just kidding. What we want to do now is we need to actually
0:23
tell feedparser what file it's going to be passing. Now, you could actually use the url, okay. You could use the url from, directly,
0:33
but the problem with that is, is that it requires the internet connection, and if it can't get that, it's going to fail.
0:40
So, it's actually better from and application standpoint to download the parse, the feedparser, sorry the feed,
0:49
and then parse it using a separate script. That's why we've done it in two different scripts. We could have done it in one just by
0:56
telling feedparser to point directly to the url, but we're going to do it parsing a local file. Okay, so feedfile is a newreleases.xml.
1:09
So, the same file that we just downloaded and saved. Okay, now what do we want to do? We want to actually parse that file, alright?
1:18
And that's an argument for feedparser. So, we're going to parse it and store it in the feed variable. Okay, so feedparser.parse.
1:30
Now, it's as the name implies, right? It's going to just parse over the file, and store all of those contents inside the feed variable. Okay?
1:43
Now, if we bring up our xml file. Where have you gone? Let's open it here, open it again in internet explorer.
1:53
Don't hate me, let's make it a little bit bigger. Okay, so what we notice here is that, we can see it's staggered out, it's xml.
2:02
We've all seen that before, but it's pretty much, once it's loaded into feed, it becomes a sort of dictionary, okay?
2:10
With your title, your link, your description, and these are the sort of keys that we want to pull out. Okay, and by doing that,
2:20
so to do that we actually use these tags. Okay, and that's what feedparser does. It parses, and let's you pull data based on these tags.
2:29
Alright, so you'll see that. Here's what we'll do, so this is the... what I think we should try getting is the title.
2:38
So, we want the title of the feed, the feed entry. So, Midweek Madness, Dragon's Dogma, blah blah blah. That's what we want to pull.
2:48
We also want the published date. Okay, now most xml feeds, or most rss feeds should have this. Some don't, but we'll get to that in a minute.
3:00
So, we want the publication date, alright? So we know what date. And then I think we should get the link as well,
3:05
because we would like to get the url for this deal, or this new game launch, or whatever it is. Alright, so we go back to our file here,
3:15
to our script, and we will go. We're going to use a for loop to parse over this data, okay? So, we're going to go for entry in feed,
3:26
that's this, in feed.entries. Okay, that's why I've said for entry. So, for every entry within all of the entries within feed.
3:38
What are we going to do? Well, we're going to print something. So, this is just for the sake of this script. So, we're going to print entry.published,
3:47
so even though I will point this out, this one got me at the start. Even though it says pub date here, that is not actually what feedparser gets.
3:59
That's actually called published from the feedparser standpoint. So, entry.published. Just going to use some standard string stuff here,
4:08
so bare with me, then we're going to choose the title. So, imagine this as you're writing it. The date, and then the title, alright?
4:17
So, for entry.title, throw in a colon there, and what's next? entry.link So, all we're doing is we're printing out the date,
4:30
the title, and then the link for that, and that's it. Okay, nice and simple, and you won't believe me, but that's it. Okay, so it's pretty simple.
4:45
So, we'll go Python parser.py, and that's it. Let's maximize this, so look at that. We have the date, Friday the 5th of January, 2018.
4:58
At that time, watch live on Steam, Paladins World Championships, and then, we have the url. How simple is that? We've parsed this entire feed,
5:12
ignored all the extra stuff in there that doesn't matter, and we've taken just these titles, with the date, and the url. Very, very...