Python for the .NET Developer Transcripts
Chapter: async and await in Python
Lecture: Reviewing Python web scraper (sync version)

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Before we start writing the synchronous version let's just recall exactly what's happening in the Python Web Scraper here. Change the title
0:09 this is what stands in as our title here. It's not asynced yet, but you know what? Let's be forward looking here. So what we're going to do
0:16 is we're going to go and call this function called get_titles, which down at the bottom just says let me make this identical.
0:22 Here we go. We're going to go from 220 to 230 and were going to get the HTML and then we're going to take the HTML and parse it
0:30 screen scrape it to pull the title out and we are just going to print that out green. Getting the title is super easy.
0:36 We're using Beautiful Soup, we pass it to HTML we just get the header, the H1 and we just get its text kind of cleaned up.
0:44 In order to get the HTML, we're using httpx. Another common option is requests but remember s does not support the async version
0:52 and I knew we were headed this way, surprise surprise in the end. So we wanted to use async and you can't if we went down the requests route.
0:59 So we just started with httpx which has a compatible API, plus the async stuff. So down here we're getting the contents of the URL
1:08 we're verifying that it worked and then we're returning the body of the HTML text. Let's just run this and see what we get.
1:14 Remember it's not asynchronous yet. Getting 220. Got the title. And then we actually print the title. Get 221. Got the title. We're printing it.
1:25 All well and good, it works great. We don't have any timing yet, do we? Let's put some timing in this.
1:32 I'm sure you've seen this somewhere along the way but the datetime features of Python are quite similar to what we have in .NET
1:40 with the datetime class. So, here we just go datetime this is the module and then it has a datetime
1:46 it has a date and a time and a timedelta, bunch of things so it looks a little weird but we will see a datetime. This is the class .now tO.
1:54 And then we'll have the timedelta dt which will be the new now minus the original which will results in a timedelta and then we can print.
2:04 Finished and dt.total_seconds. Here we go, we'll go with say, total_seconds with two significant places digit grouping. Sure we don't need that.
2:14 All right, let's run it one more time. Super, it took 11 seconds! Whooh!
2:31 Well, we saw that we could do better. The .NET version was 1.666 seconds for the best outcome that we got. So 1.6, 1.7 seconds versus 11.
2:42 Obviously, we want to make this work better right? Again, it's because we're just waiting 99.9 percent maybe not 99.9, 99 percent of the time
2:52 that this program is running, it's waiting on the internet or some server out there on the internet to get back to it.
2:57 It's doing extremely little amounts of work so if we could do all of that work all that waiting just in one batch that would be way way better.
3:04 Our job is to take this code and convert it to something like c-sharps asynch in a wait with asynchronous methods and so on.


Talk Python's Mastodon Michael Kennedy's Mastodon