Consuming HTTP Services in Python Transcripts
Chapter: XML services with requests
Lecture: Working with XML from Python

Login or purchase this course to watch this video and the rest of the course contents.
0:02 Let's work with some xml data from Python. So here at the University of Washington, they've got some xml data that we can go grab,
0:10 and they happen to have something from the course catalogue at Reed College which is fancy university here in Portland Oregon,
0:18 it happens to be where Steve Jobs went; so we are going to import this xml file and we are going to do something interesting,
0:24 we are going to be able to answer some interesting questions about these courses using xml processing in Python.
0:30 Now, in the beginning, we are just going to read the file, we are not going to do any sort of web service,
0:35 although technically we could sort of point directly at that xml file, later we are going to come back and do this sort of processing from Requests,
0:41 calling services that actually return xml. Alright, so let's go add a file here, process xml, and I happen to have already downloaded that Reed file,
0:51 this xml data that we can look at it here, it's what it looks like.
0:56 So, we are going to just work with this locally, in order to get started with xml in Python,
1:00 we are going to work with a particular class called Elementary, and it's going to come from xml.etree,
1:07 okay and we are also going to need to work with the file system a little bit, to get to our file, let's get a little structure, alright,
1:15 so the first thing that we need to do is actually find that file, so let's go and create a variable called folder,
1:22 and this is going to be os.path.basename(__file__) of wherever this particular file is.
1:26 Okay, dunder file will say the name of this executing Python module, so process.xml is a full path, this will give us the folder where that lives,
1:35 and then we can reach and sign xml data and lad up reed so we'll say file=os.path.join and we want to give it the folder, the xml data and reed.xml,
1:47 so this works in a nice platform independent way for example this is going to be like /xml/ on OS 10 and on Linux,
1:55 but on Windows it would be back slashes, perfect, okay, so then what we need to do is just load this up so we'll say xml text,
2:02 let's go and just put this a width block, width open(file) as fin: and then we'll just say xml_text = fin.read() that is going to read all of the text,
2:15 so now that we have that loaded up, we are going to want to load, start working with the xml, just like we used Json before,
2:20 we are going to use the xml element tree, so we are going to say ElementTree. now there is a parse,
2:26 but, again, I don't know what the deal is with these sort of file format modules
2:30 but they kind of suck in names, again, so parse actually loads like a file type thing, so what we want is ElementTree.fromstring,
2:40 so it seems to me like there could have been a more clear way to name these things but it doesn't matter, xml text is what we want
2:46 and then we could just print out the DOM just to make sure that this works, and notice we are running the github one from before
2:51 so let's go ahead and run this one now. Not a directory, yes, oh I said base name, I meant dir name, let's try that again.
3:05 There we go, element at route and so whatever this is element route if we look you'll see that's this, so what we want to do is
3:13 we actually want to get these little individual courses here, so you see there is a course, more courses, lots of these courses.
3:19 So, we are going to read in these courses, and we want to answer questions like what course is running and Eliot, in this room at a given time
3:26 or something like that okay, you can see this is not a huge file but it's decent,
3:31 it's 13 thousand lines, so it's got a lot of data about these courses, so we've already loaded these up
3:36 and the way that we can access these course pieces here is we can use what is called an xpath expression, so what I want to do is actually say courses,
3:45 we can say dom.findall and we can just say course. Now we could find, let's find the course titles, just to show you more interesting,
3:52 so we have course and then we can navigate down to title, so I could do course/title and then I could do something like this,
3:58 for c in courses: print (c.text), let's try this. Alright, so there is all the course titles, we've done a search
4:07 and we've lost some of the information like what room was that course in,
4:11 I don't know, its title is Genetics and Molecular Biology, but that is really all I know, so we are going to take a step back
4:16 and we are going to actually get that entire course node here like this little bit right there, little pop up and then we'll be able
4:22 to answer questions about it, okay, so this is going to give our courses, and now it's cool that this is an xml thing, I'll call this course_nods,
4:31 something to that effect but I want a richer container for this, so I am going to import something else up here,
4:37 we are going to import the collections module and here I am going to create a thing,
4:40 actually I'll do it outside this method call, although it's called once so it didn't really matter, I am going to create a thing called a course,
4:45 and I'll create that as a collection.named collection, so normally what we get back is just an xml node
4:48 and we could maybe stick in the dictionary but a named tuple is much nicer, so we have course and then we just say the variable,
4:55 so let's say we want the title, the room and the building, those three things,
4:59 so if we do that, and we come down here we can do something more interesting,
5:03 I can say the courses are going to be equal to, now we can do this as a loop,
5:06 I'll write it as a loop first, and I'll do a list comprehension in a minute so I'll say for n in course_nodes, and then I am going to create a course,
5:13 I want to pass some stuff to it and then I'll say courses.append, of course, now we can just come down here print out the course is,
5:23 okay, so what goes in here, well first the title, then a room, then a building, so the title like this, back over here we are going to need
5:32 we are kind of working with an element at this level we are going to need to do another query to get this,
5:37 one to go to the place and find the building one to go to place to find the room.
5:41 So those are the three things that we need, first thing is the title, that's easy, so n.find('title') and we can say text,
5:48 and let's just print that out, and see if that works, oh it's missing its arguments, okay, no worries, we'll do that next.
5:54 The next thing we need to do is we need to find the place, then the room, and then the next one, the last one is building,
6:03 let's see if they call it room, building room, they do. So here you go, this is working perfectly,
6:10 we've loaded up the xml, using elementary.from string, we've done an xpath query, a very simple one but you saw
6:18 that we could do more interesting ones, like for example the place and then we got a bunch of nodes back and for each node we did a little bit of work
6:25 to transform that node from just a bunch of xml nodes that we could still work with,
6:30 down to actual almost a class to one of these named tuples and then afterwards, we can do interesting things like we can answer the question
6:38 like what are the classes running in the building Eliot, okay, so this is pretty cool,
6:43 but we still have a little bit more work to do, let's go over here and I'll just print it out,
6:47 it seems like it's working, let's do one final thing, let's go and say building=
6:53 and we'll do an input from the user, we'll say what building are you in, so maybe somebody sit in there like wait,
7:01 what class is running in this room right now, and then we'll ask what room are they looking for, what room are you next to
7:08 and then I could do some kind of query here so I could say room courses,
7:12 and we'll just write this as a simple list comprehension, so I'll say c.title for
7:18 so we are going to say for c in courses, and we are going to pull out the title, but only the c where the building, c.building=building,
7:28 notice the incompletion here, that's beautiful, and c.room=room. Okay, so these are the ones we are looking for,
7:39 so then we'll say for c in room courses, print, let's do a little star like this,
7:47 see that title, actually we just put in the title, so we'll go like that, alright, so let's go and run this and see if this works.
7:57 First of all, I'm in Eliot, I'm in room 414, boom, those are the rooms where this is, not that many, just ten or so, let's just double check,
8:08 go to xml file here and just make sure if I go find this course that it's actually in that room.
8:14 And that there are courses that are in other rooms, so here, so those will be 234, let's try this again, I'm in that room, 234, boom,
8:23 Topics in French Enlightenment, First Year Russian, very cool, right.
8:28 So that's all there is to it, we are just going to somehow magically get a hold of the text,
8:34 later we are going to do this off of the web but right now we just got off the file system,
8:38 we are going to create an element tree, we are going to parse it from a string,
8:42 then we are going to run possibly as sequence or series of xpath expressions, one to find the course nodes and then once we have all the course nodes,
8:50 to pull the individual pieces of data out of them, right, and then finally, once you have the data like from here on down, this is pure Python,
8:57 once you have that loaded as a bunch of named tuples, well, it's just a matter or writing code against them.
9:03 I did say we can improve this a little bit here, so let's work on this, so I can take that and say okay for n in the nodes,
9:09 what I want to create and send back is one of these, and that's that, right,
9:16 so that will simplify that a little bit there, just see that it still works, Eliot, 414 boom- still working.
9:25 Okay, so maybe that is cleaner, maybe it's not, I don't know, it's up to you but here is just two nice little list comprehensions
9:32 working across and in memory xml dom. Next up we are going to apply this to some web services.


Talk Python's Mastodon Michael Kennedy's Mastodon