Consuming HTTP Services in Python Transcripts
Chapter: XML services with requests
Lecture: Consuming XML data from an HTTP service
Login or
purchase this course
to watch this video and the rest of the course contents.
0:02
Alright let's put this xml concept together with requests
0:05
and work with some real live xml data on the internet.
0:08
Real canonical case for xml these days is rss,
0:11
so we are going to go and get the rss feed from talkpython.fm,
0:15
pull back and answer a few questions about the state of the podcast.
0:20
Back over here in PyCharm I've got a new file consume_xml_service
0:25
and we are going to start by finding a way to get the data off the internet,
0:29
you can bet that that is request.
0:32
We are also going to need to work with xml itself
0:35
so we'll do a from xml.etree import ElementTree,
0:41
now we are also going to do some interesting work with dates,
0:43
and I am going to need to parse dates, and parsing dates is always super not fun,
0:47
but, we are going to use this package called python dateutil,
0:51
so check this out, I can come down here at the terminal,
0:54
and notice, that it already has the virtual environment activated, that is super cool,
0:58
so then we can say pip install python dateutil
1:03
and that is going to install it for us and from that,
1:05
we can say from dateutil.parser import parse
1:11
and this is going to allow us to parse some dates in a little bit.
1:14
Okay, everything is set up, let's add a little structure,
1:17
we'll have a little main method for now, we'll figure out what to do with that,
1:20
and like so, there we go, so first we are going to start by getting the xml,
1:25
and let's write a function this time, so we'll call it dom,
1:29
say get_xml_dom and we are going to give it url, so let's put that up here,
1:34
and this is just going to be https://talkpython.fm/rss,
1:39
okay, let's write this function rather than have PyCharm write it,
1:43
so here we are going to go and do a get,
1:45
we will get a response=request.get_url nothing fancy here,
1:49
and we probably should check the response code,
1:52
so we'll say if resp.status_code ! = 200, return none,
1:56
maybe you want to log it or something but for now we are just going to do that,
1:59
and now we are going to say dom=ElementTree.fromstring,
2:04
and we are going to give it the response.text
2:08
and we are just going to return the dom, okay so this little function here, we can put it below,
2:12
I like having the main method at the top, it kind of orchestrates everything,
2:16
so we are going to get that, we could even inline this, like so,
2:19
okay, so we want to get the dom here and then we would like to do some kind of query,
2:24
and to be honest, this really so far actually this point right here is
2:29
really whole service story and the rest of it becomes like an exercise in straight xml,
2:35
let's go ahead and run this, rss element, excellent, okay,
2:39
so the next thing we want to do is let's define an episode
2:42
and we are going to do that with my friend collection.namedtuple.
2:02
okay so I'll just throw in fields title, link and pubdate,
2:57
and then here we'll have an episode, we'll say get episodes and we'll give it the dom,
3:03
alright so this function we are also going to write and what it's going to do is given a dom,
3:07
it's going to go find all he episodes, so the way this works,
3:11
we have rss and then inside there we have this thing called the channel
3:14
and inside there we have an item for each episode,
3:18
so in xpath you don't name the top element, and we don't need the star,
3:22
we just going to get more than one back, so we'll get item nodes
3:26
it's going to be dom a find all, with some parenthesis, with some quotes,
3:31
and then let's just print out how many items we have, I think if this comes back correctly,
3:37
we should have 97, let's see, do we have 97- server says 97, fantastic.
3:42
It looks like this is working, so now what we can do is we can just return a list of episodes,
3:48
so we'll say episode of and for something, we'll say for n in item_nodes okay great,
3:56
so now we are just going to need to do some queries, we can say n.find_title,
4:03
we have link, and I think it's pubDate, it's the way they say it there and that looks good,
4:09
let's go back and get our episodes and we'll just print episodes.
4:13
Now because these are named tuples we should see the data, bam,
4:17
actually close but no cigar, so we actually found all these
4:20
but what we gave as field values is the actual nodes,
4:26
we just want the text from these.
4:28
Okay, very cool, look publish date looks reasonable, the length looks reasonable,
4:31
the title looks reasonable, super.
4:34
So there is another thing we'd like to so is like if we look at the publish date,
4:37
this is just a string in a certain format that is required in rss, it's not really time,
4:42
so that is where our parser is going to come in,
4:45
now normally parsing dates is super hard, because all of the various formats,
4:50
there is like over 700 different ways to represent dates, it's insane,
4:54
but this Python dateutils we'll try to parse them and it knows quite a bit of them,
4:58
so let's just give it a parse and see if it will take it.
5:02
Oh sweet, it's a datetime, look at that,
5:05
so we got the parsing of the date working, and yeah, that looks right to me,
5:08
let's check the next one, yeah, these are good.
5:11
Okay, so here we have our episodes and technically,
5:14
we have these episodes in reverse chronological order,
5:18
if we wanted them in another order, we could now sort on this
5:22
like suppose we want them oldest to newest, like in increasing order,
5:25
so I'll say episodes = this and then we could return sorted episodes
5:30
and then we want to give it the key it's going to be a function, a lambda function,
5:34
that takes an episode and it return the pubDate
5:38
and that will sort it the way we want it, so we should see one right here at the top,
5:44
episode zero, here we go over, episode one, episode two, perfect.
5:49
Because we were able to turn these into real dates easily,
5:52
we can actually run sort algorithms and not just do string sorting
5:56
which wouldn't really help us, okay, let's put that character down here,
6:00
do a little cleanup, okay, so now we have our episodes,
6:03
let's just print out the first three episodes, so for e in episodes,
6:08
and we are going to print out, let's print the number
6:12
and let's print out the episode title.
6:16
Now, let's do an enumerate here so we can get the index
6:21
and while normally the index would be off by one, we do want to add one to it,
6:25
my show numbers also start at zero because it's a zero based podcast,
6:29
come on, so the index itself will work perfectly and here we can say title,
6:33
and this should give us some kind of report on all of them,
6:36
let's just do the first five so we can see them on the screen.
6:40
Boom, number one, notice, number zero introducing the show,
6:44
number one- Eve, number two- Python and Mongo
6:46
and so on, and so on, how cool is that?
6:49
So, here we are, pulling live xml off of the site,
6:53
and processing it in real time using these xml techniques.