Python for the .NET Developer Transcripts
Chapter: Package management and external libraries
Lecture: Getting HTML with Python
0:00 Well, we've installed and imported is httpx. The next thing we want to do is to go through and get some of the HTML and print out the headers.
0:11 We're going to start by just getting the HTML. These are two separate steps, so let's say get titles. I'll just write that as a function.
0:20 Define that down here. Now, let's say something like this. for n in range(220, 230) because of what we were working with.
0:29 Let's just print out n really quick. Just to see what happens when we run it. Okay, good, printing out. Instead of printing out I want to say
0:37 HTML equals get_HTML(n) and that's going to be another little function we're going to write. It's going to be an integer and let's say it returns
0:48 a string, like that that's going to be pretty good. Now, you'll see the power of this cool little library. We also want to work with colorama.
0:59 And the reason we want to use colorama is when we do print statements you want to see where these are coming from.
1:03 Now, let's say we're going to print this out yellow right here. Alright, it's upset that we're not returning the string yet but that's fine
1:13 we're not actually doing it. So here's what we get when we say there's going to be a response from httpx.get and we need the url.
1:22 Url will be in the f-string like so, nice and simple. Now, I didn't point out on the C# side it's actually a lot of work to make
1:33 the HTTP client follow redirects and this is a redirect of the real url. But over in Python, a lot of libraries automatically do that.
1:41 So this should work. And then we can go over here and like let's just print, response, status code. For episode 220, 200, 200, 200. Hey, that's good!
1:53 I'm not really sure what the HTML is but it looks like it's working. So let's go over here and return, resp.text.
2:01 The other thing we probably want to do is we want to check for errors. We could say if the errors if the request is not 200 or 201
2:07 or if there's an error type of request, do something but this has a nice raise for status behavior that we can add. I'll throw in an exception
2:15 if it's not some form of success code. And finally down here just to make sure everything's hanging together let's print out the first 10 characters
2:23 or something of this. Remember, we can use slicing so I can just say this to get like a substring which is a kind of cool thing. Alright, let's run it.
2:33 Doc type, Doc type, Doc type. Not terribly interesting but it looks like HTML and that's pretty awesome. So the next thing we're going to do
2:41 is we're going to have to write a function that says get title from HTML. First half of our program is done.