Consuming HTTP Services in Python Transcripts
Chapter: Binary data from services (and elsewhere)
Lecture: A podcast MP3 downloader (binary downloader)
Login or purchase this course to watch this video and the rest of the course contents.
0:01 In this demo, we are going to go and access a binary resource off of the internet using requests. So, I am going to get you a get request,
0:10 but this time to some kind of binary resource, and what we'll get back is a bunch of bytes,
0:15 we want to bring this back to our app and while we could work with them in memory, we are actually going to write them down to disk.
0:21 Okay, so let's see what's involved in making request do this whole round trip. Now you've seen me create a number of these little starter file projects
0:28 throughout this class so far and I decided we are just going to start from something that is kind of already structured,
0:34 so we are going to have a function called get_episode_files it's going to take a random rss url, this could work for any podcast,
0:40 I just happen to be picking on my own and we are going to say for file in these files here, maybe file url or something like that,
0:48 we are going to call this function download file, so there is going to be two parts to this, we are going to go and work with xml, here, right,
0:57 this is standard xml so we are going to need to add an import at the top,
1:00 so we'll have our ElementTree of course we are going to use request to download it,
1:08 ElementTree to parse it, and then we are going to use OS and some other modules we are going to talk about in a moment to actually put it on disk.
1:15 Okay, great, let's do it, let's write the download part, now this is not going to download the files, it's going to download the xml
1:23 which will tell us where the files live, so we'll begin like this, so we'll again do a get request against the main rss url
1:33 and we should check the status code but you guys know how that goes, we've done it a lot, so we are going to assume that the xml text is response.text
1:42 and then we are going to load it up into a dom, ElementTree.fromstring(xml_text) and then we just need to write some kind of list comprehension here
1:52 that is going to say given a dom get me all of the links, so remember, it went channel/item and in the item we had a link,
2:02 we can just use that as our xpath query, so for link let's say link_node.text
2:10 for link node in dom.findall('channel/item/link') like that and that should do it,
2:15 let's just run this bit real quick and we can go ahead and print the file,
2:20 so run boom, so that is actually not the right xpath query this while decent, this is going to show us the actual episode page
2:31 what we want is the actual episode rss. So, I need a different one, let's look at the xml real quick.
2:39 So what we actually want is we want this thing called an enclosure, not a link, so then in there you can see we have the url which is an attribute,
2:48 so we haven't really talked about that, so we are going to do this, enclosure and let's call this enclosure_node here and instead of getting the text,
2:57 we are going to get the attribute here, which is url. So we'll do url, let's run this again, see if we got it right this time.
3:06 Oh, oops, this has to be attrib, so here we have all of the actual mp3 files, so there we go, now I got exactly what we were looking for,
3:18 so channel item enclosure, get the attribute from the attrib dictionary called url. Perfect, So now it's really down to the binary data part,
3:26 how are we going to download these files once we have them, right, we are passing them off of this download file function, how does that work?
3:34 Well, now it's time to start doing the interesting things, so for each file that comes in here, let's just put a little print here
3:41 and actually let's go say we only want the last three files downloaded, or the last three episodes to be downloaded,
3:50 just because we don't want this to go crazy and just download a GB or whatever,
3:54 okay so over here we are going to say given a file I want to download it and let's just go with a naive way of doing things
4:00 with requests and then we'll do something, but we'll set a few settings to make this work better, so again, we'll say
4:05 response=request.get and this file maybe we should call it url, whatever,
4:10 we are going to get it, and then on the response, we have a text, we have a json,
4:16 but we also have a raw, and the raw is what we can use to get to that actual data,
4:24 so we are going to get this and we would like to write that to this file, so I'll say dest file = now we need to get a few things here,
4:32 we need to get just the ending off of this, and so we'll say I'll call it base file,
4:39 we'll say os.path.basename(file), not sure if that will work with the http on it,
4:44 we might have to do little string in place but we are going to give it a try and then the destination is going to be path.join,
4:52 well, we are going to take the destination folder, we want to make sure it's an absolute path so we'll say os.path.absoulte_path(dest_foder),
4:58 and we want to join that with the base_file, like so,
5:15 okay, so here is the just the ending like filename.mp3 not the full path, and this is the full path to where on my folder
5:25 is going which we've set on my desktop in this mp3s folder. Let's just print really quick, the base_file and the dest_file,
5:34 I'll just make sure that something reasonable is going to come out of this. Oh that is not working so well, let me just give it the full path,
5:40 like notice it's not following those, I think there may be a setting I can give it,
5:45 that will make that work, but just for now we are going to go like so. Alright, but look, if you come back here and look,
5:52 it looks like besides that little squiggly bit that went crazy, we got the base_file, and this should work.
5:58 Okay great, I am taking that file and saving it here, it looks like exactly what we want to do,
6:03 so let's just do a print downloading and saving the base_file, okay, so we can create a context manager, a width block, we'll say open,
6:15 I want to open up the destination file and I want to write to this as a binary file,
6:19 and I am just going to say fout.write, I am going to give it the response.raw.
6:25 Now, I do not believe this is going to work, in fact, I am quite sure it's not,
6:30 but let's go and see what happens, so we'll run it, it downloads the xml, and, yes a byte like object. Now, that is not great.
6:38 So, let's take a different approach here, let's come up here and import thin thing called shell util,
6:44 now shell util has a cool feature, so we can say here copy file object so I want to go through something like response.raw
6:52 and I want to write it to fout. Okay, so this will basically handle all the read from this stream, write to that stream business for us.
7:01 So let's do it again. How exciting it looks like it's working, let's go over here while that is dong that,
7:09 and let's have a look and see what's in that folder. okay, so it's already downloaded, exploring awesome Python,
7:15 it's downloaded running grumpy on Python, now each one of these should be around 50 MBs let's see how we're doing.
7:21 Oh, zero bytes, really- that doesn't seem fantastic, zero bytes, zero bytes,
7:26 and I can tell you if we try to play them, not so amazing, what went wrong? Well, when we are working with binary data,
7:33 we can't just grab the response and write it, what we need to do is we need to do two things, we need to tell the response
7:41 to decode the content and say true, okay so we needed to decode the content, let's see how that works,
7:48 alright, let's see how we did, did it work, no, still zero bytes, there is one more thing we have to do for all this stream stuff to work,
7:56 so I am going to go ahead and stop it. The other thing we need to do is just say look, I want you to stream as binary data as you get this request,
8:03 okay, so stream is true, deco content is true, one more time, let's see what we get- look at that,
8:09 it's got some kind of a little header thing at the top, and right now we are downloading saving awesome Python and grumpy is not here yet,
8:19 oh, just in time, okay, let's go look at this one, oh 50.6 MB and if we hit play,
8:22 you probably can't hear that but I can and you can see all the data needed is right there,
8:26 again, looking great, it says four minutes, now it knows how long it is, okay, just got downloaded.
8:33 But now everything is working, we are downloading our binary data, okay so let's just review what that took.
8:39 There was a little bit of work to actually find the link to the binary file, we did that by getting the xml file, parsing it with xpath,
8:46 doing some list comprehension magic there, pulling off the attribute url, but that is not that important, right, that varies per service,
8:55 but once you have a link to a binary thing you want to download, here is what we do; we go and we do a request.get on it, and we say stream=true,
9:05 just stream that from the server, and while you are at it, decode the content, okay, and then we did a little bit of juggling to take the remote name
9:13 and a local folder and combine those into a destination file, put some downloading message here so we know what is going on,
9:20 and then we create file stream it's important, it's binary and raidable, and then to save us the work of juggling,
9:27 reading from one stream and writing to the other, we say shell copy file object from this remote stream to this local stream-
9:34 boom, we're done. And that is working with binary files and requests as web services.