#100DaysOfCode in Python Transcripts
Chapter: Days 46-48: Web Scraping with BeautifulSoup4
Lecture: Requests best practice
Login or purchase this course to watch this video and the rest of the course contents.
0:00 Alright, one quick public service announcement regarding this code that we've just written. This section here, the pulling of the site.
0:10 Not actually kosher to keep that in this script. The reason for that is we don't want to submit a request to a website
0:18 every time we want to scrape some data. We might run a scraper like this at a different interval set to actually pulling the website.
0:28 The reason for that is, you think about it, not every site is going to update every few minutes. Not every site is going to update every day.
0:36 So if you keep pinging that site with a request, you're going to very quickly spam them. You might even get yourself blocked.
0:44 And you could use up their bandwidth limit. There are certain websites that, you know, can only support a certain amount of hits per
0:52 per minute, alright? And if you keep doing that, you're going to make the website that you enjoy viewing so much pretty unhappy.
1:00 So the best practice here is to put all of this into a different script, run that on a cron job at a different interval
1:09 or whatever other automated way you want to do that, and then using Beautiful Soup 4, point at the downloaded HTML file
1:19 or at the page that you have pulled down from requests, alright. Nice and easy. It's actually much more pleasant for everyone
1:30 to do it that way, and I would totally recommend doing it. The only reason we've done it this way right now is just for demonstration purposes.
1:37 It's much easier. But in production, definitely put your requests in a different script and use Beautiful Soup 4
1:45 to talk to a static file, not the actual URL, unless of course it's a one off thing.