Python for the .NET developer Transcripts
Chapter: Package management and external libraries
Lecture: Getting HTML with Python
0:00 Well, we've installed and imported is httpx.
0:04 The next thing we want to do
0:05 is to go through and get some of the html
0:09 and print out the headers.
0:10 We're going to start by just getting the html.
0:12 These are two separate steps, so let's say get titles.
0:16 I'll just write that as a function.
0:19 Define that down here.
0:21 Now, let's say something like this.
0:22 for n in range(220, 230)
0:26 because of what we were working with.
0:28 Let's just print out n really quick.
0:30 Just to see what happens when we run it.
0:32 Okay, good, printing out.
0:34 Instead of printing out I want to say
0:36 html equals get_html(n)
0:40 and that's going to be another little function
0:42 we're going to write.
0:45 It's going to be an integer and let's say it returns
0:47 a string, like that
0:48 that's going to be pretty good.
0:49 Now, you'll see the power of this cool little library.
0:51 We also want to work with colorama.
0:58 And the reason we want to use colorama
0:59 is when we do print statements
1:00 you want to see where these are coming from.
1:02 Now, let's say we're going to print this out yellow right here.
1:10 Alright, it's upset that we're not
1:11 returning the string yet but that's fine
1:12 we're not actually doing it.
1:13 So here's what we get
1:14 when we say there's going to be a response from
1:18 httpx.get and we need the url.
1:21 Url will be in the f-string
1:25 like so, nice and simple.
1:27 Now, I didn't point out on the C# side
1:30 it's actually a lot of work to make
1:32 the HTTP client follow redirects
1:35 and this is a redirect of the real url.
1:37 But over in Python, a lot of libraries
1:39 automatically do that.
1:40 So this should work.
1:41 And then we can go over here and like
1:43 let's just print, response, status code.
1:47 For episode 220, 200, 200, 200.
1:51 Hey, that's good!
1:52 I'm not really sure what the html is
1:54 but it looks like it's working.
1:56 So let's go over here and return, resp.text.
2:00 The other thing we probably want to do is
2:02 we want to check for errors.
2:03 We could say if the errors
2:04 if the request is not 200 or 201
2:06 or if there's an error type of request, do something
2:09 but this has a nice raise for status behavior
2:13 that we can add. I'll throw in an exception
2:14 if it's not some form of success code.
2:17 And finally down here
2:18 just to make sure everything's hanging together
2:20 let's print out the first 10 characters
2:22 or something of this.
2:24 Remember, we can use slicing
2:25 so I can just say this to get like a substring
2:28 which is a kind of cool thing.
2:29 Alright, let's run it.
2:32 Doc type, Doc type, Doc type.
2:34 Not terribly interesting
2:36 but it looks like html and that's pretty awesome.
2:39 So the next thing we're going to do
2:40 is we're going to have to write a function
2:41 that says get title from html.
2:45 First half of our program is done.