#100DaysOfCode in Python Transcripts
Chapter: Days 46-48: Web Scraping with BeautifulSoup4
Lecture: Requests best practice
0:00 Alright, one quick public service announcement
0:02 regarding this code that we've just written.
0:04 This section here, the pulling of the site.
0:09 Not actually kosher to keep that in this script.
0:12 The reason for that is we don't want
0:15 to submit a request to a website
0:17 every time we want to scrape some data.
0:20 We might run a scraper like this at a different
0:23 interval set to actually pulling the website.
0:27 The reason for that is, you think about it,
0:29 not every site is going to update every few minutes.
0:32 Not every site is going to update every day.
0:35 So if you keep pinging that site with a request,
0:39 you're going to very quickly spam them.
0:42 You might even get yourself blocked.
0:43 And you could use up their bandwidth limit.
0:46 There are certain websites that, you know,
0:48 can only support a certain amount of hits per
0:51 per minute, alright?
0:52 And if you keep doing that, you're going to make the website
0:56 that you enjoy viewing so much pretty unhappy.
0:59 So the best practice here is to put all of this
1:04 into a different script, run that on a cron job
1:06 at a different interval
1:08 or whatever other automated way you want to do that,
1:11 and then using Beautiful Soup 4,
1:15 point at the downloaded HTML file
1:18 or at the page that you have pulled down
1:22 from requests, alright.
1:24 Nice and easy.
1:26 It's actually much more pleasant for everyone
1:29 to do it that way, and I would totally recommend doing it.
1:32 The only reason we've done it this way right now
1:34 is just for demonstration purposes.
1:36 It's much easier.
1:38 But in production, definitely put your requests
1:41 in a different script and use Beautiful Soup 4
1:44 to talk to a static file, not the actual URL,
1:48 unless of course it's a one off thing.