Consuming HTTP Services in Python Transcripts
Chapter: XML services with requests
Lecture: Consuming XML data from an HTTP service
Login or
purchase this course
to watch this video and the rest of the course contents.
0:02
Alright let's put this xml concept together with requests and work with some real live xml data on the internet.
0:09
Real canonical case for xml these days is rss, so we are going to go and get the rss feed from talkpython.fm,
0:16
pull back and answer a few questions about the state of the podcast. Back over here in PyCharm I've got a new file consume_xml_service
0:26
and we are going to start by finding a way to get the data off the internet, you can bet that that is request.
0:33
We are also going to need to work with xml itself so we'll do a from xml.etree import ElementTree,
0:42
now we are also going to do some interesting work with dates, and I am going to need to parse dates, and parsing dates is always super not fun,
0:48
but, we are going to use this package called Python dateutil, so check this out, I can come down here at the terminal,
0:55
and notice, that it already has the virtual environment activated, that is super cool, so then we can say pip install Python dateutil
1:04
and that is going to install it for us and from that, we can say from dateutil.parser import parse
1:12
and this is going to allow us to parse some dates in a little bit. Okay, everything is set up, let's add a little structure,
1:18
we'll have a little main method for now, we'll figure out what to do with that,
1:21
and like so, there we go, so first we are going to start by getting the xml, and let's write a function this time, so we'll call it dom,
1:30
say get_xml_dom and we are going to give it url, so let's put that up here, and this is just going to be https://talkpython.fm/rss,
1:40
okay, let's write this function rather than have PyCharm write it, so here we are going to go and do a get,
1:46
we will get a response=request.get_url nothing fancy here, and we probably should check the response code,
1:53
so we'll say if resp.status_code ! = 200, return none, maybe you want to log it or something but for now we are just going to do that,
2:00
and now we are going to say dom=ElementTree.fromstring, and we are going to give it the response.text
2:09
and we are just going to return the dom, okay so this little function here, we can put it below,
2:13
I like having the main method at the top, it kind of orchestrates everything, so we are going to get that, we could even inline this, like so,
2:20
okay, so we want to get the dom here and then we would like to do some kind of query,
2:25
and to be honest, this really so far actually this point right here is
2:30
really whole service story and the rest of it becomes like an exercise in straight xml, let's go ahead and run this, rss element, excellent, okay,
2:40
so the next thing we want to do is let's define an episode and we are going to do that with my friend collection.namedtuple.
2:03
okay so I'll just throw in fields title, link and pubdate, and then here we'll have an episode, we'll say get episodes and we'll give it the dom,
3:04
alright so this function we are also going to write and what it's going to do is given a dom,
3:08
it's going to go find all he episodes, so the way this works, we have rss and then inside there we have this thing called the channel
3:15
and inside there we have an item for each episode, so in xpath you don't name the top element, and we don't need the star,
3:23
we just going to get more than one back, so we'll get item nodes it's going to be dom a find all, with some parenthesis, with some quotes,
3:32
and then let's just print out how many items we have, I think if this comes back correctly,
3:38
we should have 97, let's see, do we have 97- server says 97, fantastic.
3:43
It looks like this is working, so now what we can do is we can just return a list of episodes,
3:49
so we'll say episode of and for something, we'll say for n in item_nodes okay great,
3:57
so now we are just going to need to do some queries, we can say n.find_title,
4:04
we have link, and I think it's pubDate, it's the way they say it there and that looks good,
4:10
let's go back and get our episodes and we'll just print episodes. Now because these are named tuples we should see the data, bam,
4:18
actually close but no cigar, so we actually found all these but what we gave as field values is the actual nodes, we just want the text from these.
4:29
Okay, very cool, look publish date looks reasonable, the length looks reasonable, the title looks reasonable, super.
4:35
So there is another thing we'd like to so is like if we look at the publish date,
4:38
this is just a string in a certain format that is required in rss, it's not really time, so that is where our parser is going to come in,
4:46
now normally parsing dates is super hard, because all of the various formats, there is like over 700 different ways to represent dates, it's insane,
4:55
but this Python dateutils we'll try to parse them and it knows quite a bit of them, so let's just give it a parse and see if it will take it.
5:03
Oh sweet, it's a datetime, look at that, so we got the parsing of the date working, and yeah, that looks right to me,
5:09
let's check the next one, yeah, these are good. Okay, so here we have our episodes and technically,
5:15
we have these episodes in reverse chronological order, if we wanted them in another order, we could now sort on this
5:23
like suppose we want them oldest to newest, like in increasing order, so I'll say episodes = this and then we could return sorted episodes
5:31
and then we want to give it the key it's going to be a function, a lambda function, that takes an episode and it return the pubDate
5:39
and that will sort it the way we want it, so we should see one right here at the top, episode zero, here we go over, episode one, episode two, perfect.
5:50
Because we were able to turn these into real dates easily, we can actually run sort algorithms and not just do string sorting
5:57
which wouldn't really help us, okay, let's put that character down here, do a little cleanup, okay, so now we have our episodes,
6:04
let's just print out the first three episodes, so for e in episodes, and we are going to print out, let's print the number
6:13
and let's print out the episode title. Now, let's do an enumerate here so we can get the index
6:22
and while normally the index would be off by one, we do want to add one to it, my show numbers also start at zero because it's a zero based podcast,
6:30
come on, so the index itself will work perfectly and here we can say title, and this should give us some kind of report on all of them,
6:37
let's just do the first five so we can see them on the screen. Boom, number one, notice, number zero introducing the show,
6:45
number one- Eve, number two- Python and Mongo and so on, and so on, how cool is that? So, here we are, pulling live xml off of the site,
6:54
and processing it in real time using these xml techniques.