#100DaysOfCode in Python Transcripts
Chapter: Days 46-48: Web Scraping with BeautifulSoup4
Lecture: Requests best practice
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Alright, one quick public service announcement
0:02
regarding this code that we've just written.
0:04
This section here, the pulling of the site.
0:09
Not actually kosher to keep that in this script.
0:12
The reason for that is we don't want
0:15
to submit a request to a website
0:17
every time we want to scrape some data.
0:20
We might run a scraper like this at a different
0:23
interval set to actually pulling the website.
0:27
The reason for that is, you think about it,
0:29
not every site is going to update every few minutes.
0:32
Not every site is going to update every day.
0:35
So if you keep pinging that site with a request,
0:39
you're going to very quickly spam them.
0:42
You might even get yourself blocked.
0:43
And you could use up their bandwidth limit.
0:46
There are certain websites that, you know,
0:48
can only support a certain amount of hits per
0:51
per minute, alright?
0:52
And if you keep doing that, you're going to make the website
0:56
that you enjoy viewing so much pretty unhappy.
0:59
So the best practice here is to put all of this
1:04
into a different script, run that on a cron job
1:06
at a different interval
1:08
or whatever other automated way you want to do that,
1:11
and then using Beautiful Soup 4,
1:15
point at the downloaded HTML file
1:18
or at the page that you have pulled down
1:22
from requests, alright.
1:24
Nice and easy.
1:26
It's actually much more pleasant for everyone
1:29
to do it that way, and I would totally recommend doing it.
1:32
The only reason we've done it this way right now
1:34
is just for demonstration purposes.
1:36
It's much easier.
1:38
But in production, definitely put your requests
1:41
in a different script and use Beautiful Soup 4
1:44
to talk to a static file, not the actual URL,
1:48
unless of course it's a one off thing.