Consuming HTTP Services in Python Transcripts
Chapter: XML services with requests
Lecture: Working with XML from Python
0:02 Let's work with some xml data from Python.
0:05 So here at the University of Washington,
0:07 they've got some xml data that we can go grab,
0:09 and they happen to have something from the course catalogue at Reed College
0:14 which is fancy university here in Portland Oregon,
0:17 it happens to be where Steve Jobs went;
0:19 so we are going to import this xml file and we are going to do something interesting,
0:23 we are going to be able to answer some interesting questions
0:26 about these courses using xml processing in Python.
0:29 Now, in the beginning, we are just going to read the file,
0:32 we are not going to do any sort of web service,
0:34 although technically we could sort of point directly at that xml file,
0:36 later we are going to come back and do this sort of processing from Requests,
0:40 calling services that actually return xml.
0:44 Alright, so let's go add a file here, process xml,
0:47 and I happen to have already downloaded that Reed file,
0:50 this xml data that we can look at it here, it's what it looks like.
0:55 So, we are going to just work with this locally, in order to get started with xml in Python,
0:59 we are going to work with a particular class called Elementary,
1:02 and it's going to come from xml.etree,
1:06 okay and we are also going to need to work with the file system a little bit,
1:10 to get to our file, let's get a little structure, alright,
1:14 so the first thing that we need to do is actually find that file,
1:17 so let's go and create a variable called folder,
1:21 and this is going to be os.path.basename(__file__) of wherever this particular file is.
1:25 Okay, dunder file will say the name of this executing Python module,
1:29 so process.xml is a full path, this will give us the folder where that lives,
1:34 and then we can reach and sign xml data and lad up reed
1:37 so we'll say file=os.path.join and we want to give it the folder, the xml data and reed.xml,
1:46 so this works in a nice platform independent way for example
1:49 this is going to be like /xml/ on OS 10 and on Linux,
1:54 but on Windows it would be back slashes, perfect, okay,
1:57 so then what we need to do is just load this up so we'll say xml text,
2:01 let's go and just put this a width block, width open(file) as fin:
2:05 and then we'll just say xml_text = fin.read() that is going to read all of the text,
2:14 so now that we have that loaded up, we are going to want to load,
2:16 start working with the xml, just like we used Json before,
2:19 we are going to use the xml element tree,
2:22 so we are going to say ElementTree. now there is a parse,
2:25 but, again, I don't know what the deal is with these sort of file format modules
2:29 but they kind of suck in names, again, so parse actually loads like a file type thing,
2:35 so what we want is ElementTree.fromstring,
2:39 so it seems to me like there could have been a more clear way to name these things
2:42 but it doesn't matter, xml text is what we want
2:45 and then we could just print out the DOM just to make sure that this works,
2:48 and notice we are running the github one from before
2:50 so let's go ahead and run this one now.
2:54 Not a directory, yes, oh I said base name, I meant dir name, let's try that again.
3:04 There we go, element at route and so whatever this is element route
3:08 if we look you'll see that's this, so what we want to do is
3:12 we actually want to get these little individual courses here,
3:15 so you see there is a course, more courses, lots of these courses.
3:18 So, we are going to read in these courses, and we want to answer questions
3:21 like what course is running and Eliot, in this room at a given time
3:25 or something like that okay, you can see this is not a huge file but it's decent,
3:30 it's 13 thousand lines, so it's got a lot of data about these courses,
3:33 so we've already loaded these up
3:35 and the way that we can access these course pieces here
3:37 is we can use what is called an xpath expression,
3:41 so what I want to do is actually say courses,
3:44 we can say dom.findall and we can just say course.
3:48 Now we could find, let's find the course titles, just to show you more interesting,
3:51 so we have course and then we can navigate down to title,
3:54 so I could do course/title and then I could do something like this,
3:57 for c in courses: print (c.text), let's try this.
4:04 Alright, so there is all the course titles, we've done a search
4:06 and we've lost some of the information like what room was that course in,
4:10 I don't know, its title is Genetics and Molecular Biology, but that is really all I know,
4:13 so we are going to take a step back
4:15 and we are going to actually get that entire course node here
4:18 like this little bit right there, little pop up and then we'll be able
4:21 to answer questions about it, okay, so this is going to give our courses,
4:25 and now it's cool that this is an xml thing, I'll call this course_nods,
4:30 something to that effect but I want a richer container for this,
4:33 so I am going to import something else up here,
4:36 we are going to import the collections module and here I am going to create a thing,
4:39 actually I'll do it outside this method call, although it's called once
4:41 so it didn't really matter, I am going to create a thing called a course,
4:44 and I'll create that as a collection.named collection, so normally what we get back is just an xml node
4:47 and we could maybe stick in the dictionary but a named tuple is much nicer,
4:52 so we have course and then we just say the variable,
4:54 so let's say we want the title, the room and the building, those three things,
4:58 so if we do that, and we come down here we can do something more interesting,
5:02 I can say the courses are going to be equal to, now we can do this as a loop,
5:05 I'll write it as a loop first, and I'll do a list comprehension in a minute
5:09 so I'll say for n in course_nodes, and then I am going to create a course,
5:12 I want to pass some stuff to it and then I'll say courses.append,
5:17 of course, now we can just come down here print out the course is,
5:22 okay, so what goes in here, well first the title, then a room, then a building,
5:27 so the title like this, back over here we are going to need
5:31 we are kind of working with an element at this level
5:33 we are going to need to do another query to get this,
5:36 one to go to the place and find the building one to go to place to find the room.
5:40 So those are the three things that we need, first thing is the title, that's easy,
5:43 so n.find('title') and we can say text,
5:47 and let's just print that out, and see if that works,
5:50 oh it's missing its arguments, okay, no worries, we'll do that next.
5:53 The next thing we need to do is we need to find the place, then the room,
5:56 and then the next one, the last one is building,
6:02 let's see if they call it room, building room, they do.
6:07 So here you go, this is working perfectly,
6:09 we've loaded up the xml, using elementary.from string,
6:14 we've done an xpath query, a very simple one but you saw
6:17 that we could do more interesting ones, like for example the place
6:20 and then we got a bunch of nodes back and for each node we did a little bit of work
6:24 to transform that node from just a bunch of xml nodes that we could still work with,
6:29 down to actual almost a class to one of these named tuples and then afterwards,
6:34 we can do interesting things like we can answer the question
6:37 like what are the classes running in the building Eliot, okay, so this is pretty cool,
6:42 but we still have a little bit more work to do, let's go over here and I'll just print it out,
6:46 it seems like it's working, let's do one final thing, let's go and say building=
6:52 and we'll do an input from the user, we'll say what building are you in,
6:57 so maybe somebody sit in there like wait,
7:00 what class is running in this room right now,
7:02 and then we'll ask what room are they looking for, what room are you next to
7:07 and then I could do some kind of query here so I could say room courses,
7:11 and we'll just write this as a simple list comprehension, so I'll say c.title for
7:17 so we are going to say for c in courses, and we are going to pull out the title,
7:24 but only the c where the building, c.building=building,
7:27 notice the incompletion here, that's beautiful, and c.room=room.
7:32 Okay, so these are the ones we are looking for,
7:38 so then we'll say for c in room courses, print, let's do a little star like this,
7:46 see that title, actually we just put in the title, so we'll go like that, alright,
7:52 so let's go and run this and see if this works.
7:56 First of all, I'm in Eliot, I'm in room 414, boom, those are the rooms where this is,
8:03 not that many, just ten or so, let's just double check,
8:07 go to xml file here and just make sure if I go find this course
8:11 that it's actually in that room.
8:13 And that there are courses that are in other rooms, so here,
8:16 so those will be 234, let's try this again, I'm in that room, 234, boom,
8:22 Topics in French Enlightenment, First Year Russian, very cool, right.
8:27 So that's all there is to it, we are just going to somehow magically get a hold of the text,
8:33 later we are going to do this off of the web but right now we just got off the file system,
8:37 we are going to create an element tree, we are going to parse it from a string,
8:41 then we are going to run possibly as sequence or series of xpath expressions,
8:45 one to find the course nodes and then once we have all the course nodes,
8:49 to pull the individual pieces of data out of them, right, and then finally,
8:52 once you have the data like from here on down, this is pure Python,
8:56 once you have that loaded as a bunch of named tuples,
8:59 well, it's just a matter or writing code against them.
9:02 I did say we can improve this a little bit here, so let's work on this,
9:05 so I can take that and say okay for n in the nodes,
9:08 what I want to create and send back is one of these, and that's that, right,
9:15 so that will simplify that a little bit there, just see that it still works,
9:19 Eliot, 414 boom- still working.
9:24 Okay, so maybe that is cleaner, maybe it's not, I don't know,
9:27 it's up to you but here is just two nice little list comprehensions
9:31 working across and in memory xml dom.
9:34 Next up we are going to apply this to some web services.