Consuming HTTP Services in Python Transcripts
Chapter: Binary data from services (and elsewhere)
Lecture: A podcast MP3 downloader (binary downloader)
0:01 In this demo, we are going to go and access
0:04 a binary resource off of the internet using requests.
0:07 So, I am going to get you a get request,
0:09 but this time to some kind of binary resource,
0:12 and what we'll get back is a bunch of bytes,
0:14 we want to bring this back to our app and while we could work with them in memory,
0:17 we are actually going to write them down to disk.
0:20 Okay, so let's see what's involved in making request do this whole round trip.
0:24 Now you've seen me create a number of these little starter file projects
0:27 throughout this class so far and I decided we are just going to start
0:30 from something that is kind of already structured,
0:33 so we are going to have a function called get_episode_files
0:36 it's going to take a random rss url, this could work for any podcast,
0:39 I just happen to be picking on my own and we are going to say
0:42 for file in these files here, maybe file url or something like that,
0:47 we are going to call this function download file,
0:50 so there is going to be two parts to this,
0:52 we are going to go and work with xml, here, right,
0:56 this is standard xml so we are going to need to add an import at the top,
0:59 so we'll have our ElementTree of course we are going to use request to download it,
1:07 ElementTree to parse it, and then we are going to use OS and some other modules
1:11 we are going to talk about in a moment to actually put it on disk.
1:14 Okay, great, let's do it, let's write the download part,
1:18 now this is not going to download the files, it's going to download the xml
1:22 which will tell us where the files live, so we'll begin like this,
1:25 so we'll again do a get request against the main rss url
1:32 and we should check the status code but you guys know how that goes,
1:35 we've done it a lot, so we are going to assume that the xml text is response.text
1:41 and then we are going to load it up into a dom, ElementTree.fromstring(xml_text)
1:48 and then we just need to write some kind of list comprehension here
1:51 that is going to say given a dom get me all of the links, so remember,
1:57 it went channel/item and in the item we had a link,
2:01 we can just use that as our xpath query,
2:04 so for link let's say link_node.text
2:09 for link node in dom.findall('channel/item/link') like that and that should do it,
2:14 let's just run this bit real quick and we can go ahead and print the file,
2:19 so run boom, so that is actually not the right xpath query this while decent,
2:27 this is going to show us the actual episode page
2:30 what we want is the actual episode rss.
2:34 So, I need a different one, let's look at the xml real quick.
2:38 So what we actually want is we want this thing called an enclosure, not a link,
2:41 so then in there you can see we have the url which is an attribute,
2:47 so we haven't really talked about that, so we are going to do this,
2:50 enclosure and let's call this enclosure_node here and instead of getting the text,
2:56 we are going to get the attribute here, which is url. So we'll do url,
3:02 let's run this again, see if we got it right this time.
3:05 Oh, oops, this has to be attrib, so here we have all of the actual mp3 files,
3:13 so there we go, now I got exactly what we were looking for,
3:17 so channel item enclosure, get the attribute from the attrib dictionary called url.
3:21 Perfect, So now it's really down to the binary data part,
3:25 how are we going to download these files once we have them, right,
3:29 we are passing them off of this download file function, how does that work?
3:33 Well, now it's time to start doing the interesting things,
3:36 so for each file that comes in here, let's just put a little print here
3:40 and actually let's go say we only want the last three files downloaded,
3:45 or the last three episodes to be downloaded,
3:49 just because we don't want this to go crazy and just download a GB or whatever,
3:53 okay so over here we are going to say given a file I want to download it
3:57 and let's just go with a naive way of doing things
3:59 with requests and then we'll do something,
4:02 but we'll set a few settings to make this work better, so again, we'll say
4:04 response=request.get and this file maybe we should call it url, whatever,
4:09 we are going to get it, and then on the response, we have a text, we have a json,
4:15 but we also have a raw, and the raw is what we can use to get to that actual data,
4:23 so we are going to get this and we would like to write that to this file,
4:26 so I'll say dest file = now we need to get a few things here,
4:31 we need to get just the ending off of this, and so we'll say I'll call it base file,
4:38 we'll say os.path.basename(file), not sure if that will work with the http on it,
4:43 we might have to do little string in place but we are going to give it a try
4:46 and then the destination is going to be path.join,
4:51 well, we are going to take the destination folder, we want to make sure it's an absolute path
4:54 so we'll say os.path.absoulte_path(dest_foder),
4:57 and we want to join that with the base_file, like so,
5:14 okay, so here is the just the ending like filename.mp3 not the full path,
5:21 and this is the full path to where on my folder
5:24 is going which we've set on my desktop in this mp3s folder.
5:27 Let's just print really quick, the base_file and the dest_file,
5:33 I'll just make sure that something reasonable is going to come out of this.
5:36 Oh that is not working so well, let me just give it the full path,
5:39 like notice it's not following those, I think there may be a setting I can give it,
5:44 that will make that work, but just for now we are going to go like so.
5:48 Alright, but look, if you come back here and look,
5:51 it looks like besides that little squiggly bit that went crazy,
5:54 we got the base_file, and this should work.
5:57 Okay great, I am taking that file and saving it here,
5:59 it looks like exactly what we want to do,
6:02 so let's just do a print downloading and saving the base_file,
6:10 okay, so we can create a context manager, a width block, we'll say open,
6:14 I want to open up the destination file and I want to write to this as a binary file,
6:18 and I am just going to say fout.write, I am going to give it the response.raw.
6:24 Now, I do not believe this is going to work, in fact, I am quite sure it's not,
6:29 but let's go and see what happens, so we'll run it, it downloads the xml,
6:32 and, yes a byte like object. Now, that is not great.
6:37 So, let's take a different approach here,
6:40 let's come up here and import thin thing called shell util,
6:43 now shell util has a cool feature, so we can say here copy file object
6:47 so I want to go through something like response.raw
6:51 and I want to write it to fout.
6:54 Okay, so this will basically handle all the read from this stream,
6:57 write to that stream business for us.
7:00 So let's do it again.
7:03 How exciting it looks like it's working, let's go over here while that is dong that,
7:08 and let's have a look and see what's in that folder.
7:11 okay, so it's already downloaded, exploring awesome Python,
7:14 it's downloaded running grumpy on Python,
7:17 now each one of these should be around 50 MBs let's see how we're doing.
7:20 Oh, zero bytes, really- that doesn't seem fantastic, zero bytes, zero bytes,
7:25 and I can tell you if we try to play them, not so amazing, what went wrong?
7:29 Well, when we are working with binary data,
7:32 we can't just grab the response and write it, what we need to do is
7:36 we need to do two things, we need to tell the response
7:40 to decode the content and say true,
7:43 okay so we needed to decode the content, let's see how that works,
7:47 alright, let's see how we did, did it work, no, still zero bytes,
7:52 there is one more thing we have to do for all this stream stuff to work,
7:55 so I am going to go ahead and stop it.
7:57 The other thing we need to do is just say look,
7:59 I want you to stream as binary data as you get this request,
8:02 okay, so stream is true, deco content is true,
8:05 one more time, let's see what we get- look at that,
8:08 it's got some kind of a little header thing at the top,
8:12 and right now we are downloading saving awesome Python
8:16 and grumpy is not here yet,
8:18 oh, just in time, okay, let's go look at this one, oh 50.6 MB and if we hit play,
8:21 you probably can't hear that but I can and you can see all the data needed is right there,
8:25 again, looking great, it says four minutes, now it knows how long it is,
8:29 okay, just got downloaded.
8:32 But now everything is working, we are downloading our binary data,
8:35 okay so let's just review what that took.
8:38 There was a little bit of work to actually find the link to the binary file,
8:41 we did that by getting the xml file, parsing it with xpath,
8:45 doing some list comprehension magic there, pulling off the attribute url,
8:49 but that is not that important, right, that varies per service,
8:54 but once you have a link to a binary thing you want to download,
8:57 here is what we do; we go and we do a request.get on it, and we say stream=true,
9:04 just stream that from the server, and while you are at it, decode the content,
9:08 okay, and then we did a little bit of juggling to take the remote name
9:12 and a local folder and combine those into a destination file,
9:15 put some downloading message here so we know what is going on,
9:19 and then we create file stream it's important, it's binary
9:23 and raidable, and then to save us the work of juggling,
9:26 reading from one stream and writing to the other,
9:29 we say shell copy file object from this remote stream to this local stream-
9:33 boom, we're done. And that is working with binary files and requests as web services.