Python for the .NET developer Transcripts
Chapter: async and await in Python
Lecture: Reviewing Python web scraper (sync version)
0:00 Before we start writing the synchronous version
0:01 let's just recall exactly what's happening
0:04 in the Python Web Scraper here.
0:07 Change the title
0:08 this is what stands in as our title here.
0:10 It's not asynced yet, but you know what?
0:12 Let's be forward looking here.
0:14 So what we're going to do
0:15 is we're going to go and call this function called
0:17 get_titles, which down at the bottom just says
0:19 let me make this identical.
0:21 Here we go. We're going to go from 220 to 230
0:24 and were going to get the HTML
0:26 and then we're going to take the HTML and parse it
0:29 screen scrape it to pull the title out
0:31 and we are just going to print that out green.
0:33 Getting the title is super easy.
0:35 We're using Beautiful Soup, we pass it to HTML
0:38 we just get the header, the H1
0:40 and we just get its text kind of cleaned up.
0:43 In order to get the HTML, we're using httpx.
0:46 Another common option is requests
0:48 but remember s does not support the async version
0:51 and I knew we were headed this way, surprise surprise
0:54 in the end. So we wanted to use async
0:56 and you can't if we went down the requests route.
0:58 So we just started with httpx
1:00 which has a compatible API, plus the async stuff.
1:04 So down here we're getting the contents of the URL
1:07 we're verifying that it worked
1:08 and then we're returning the body of the HTML text.
1:12 Let's just run this and see what we get.
1:13 Remember it's not asynchronous yet.
1:16 Getting 220. Got the title.
1:20 And then we actually print the title.
1:21 Get 221. Got the title. We're printing it.
1:24 All well and good, it works great.
1:26 We don't have any timing yet, do we?
1:28 Let's put some timing in this.
1:31 I'm sure you've seen this somewhere along the way
1:33 but the datetime features of Python
1:37 are quite similar to what we have in .NET
1:39 with the datetime class.
1:40 So, here we just go datetime
1:42 this is the module and then it has a datetime
1:45 it has a date and a time and a timedelta, bunch of things
1:48 so it looks a little weird but we will see a datetime.
1:50 This is the class .now tO.
1:53 And then we'll have the timedelta dt
1:55 which will be the new now minus the original
2:00 which will results in a timedelta
2:01 and then we can print.
2:03 Finished and dt.total_seconds.
2:07 Here we go, we'll go with say, total_seconds
2:09 with two significant places digit grouping.
2:12 Sure we don't need that.
2:13 All right, let's run it one more time.
2:27 Super, it took 11 seconds! Whooh!
2:30 Well, we saw that we could do better.
2:31 The .NET version was 1.666 seconds for the best outcome
2:36 that we got. So 1.6, 1.7 seconds versus 11.
2:41 Obviously, we want to make this work better right?
2:44 Again, it's because we're just waiting 99.9 percent
2:48 maybe not 99.9, 99 percent of the time
2:51 that this program is running, it's waiting on the internet
2:53 or some server out there on the internet to get back to it.
2:56 It's doing extremely little amounts of work so
2:58 if we could do all of that work
2:59 all that waiting just in one batch
3:01 that would be way way better.
3:03 Our job is to take this code and convert it to something
3:06 like c-sharps asynch in a wait
3:09 with asynchronous methods and so on.