Consuming HTTP Services in Python Transcripts
Chapter: Screen scraping: Adding APIs where there are none
Lecture: Survey of screen scraping libraries

Login or purchase this course to watch this video and the rest of the course contents.
0:01 So let's talk about some of our options that we can use for web scraping. Certainly you don't want to just load the HTML and do this yourself,
0:10 so one really nice combination is to use this library called Beautiful Soup and Beautiful Soup doesn't download the content, it just parses the text,
0:18 you want to make sure request in there actually get the content, and we've seen how to do basic http get with requests,
0:24 the whole way through this class so that is not a big deal; and we just hand off the HTML to Beautiful Soup and it lets us do things like
0:29 search by css, and things like that we can also use Scrapy; Scrapy is really nice and there is a whole range of things you can do with Scrapy,
0:40 so I definitely recommend that you check out Scrapy as well. Originally I had chosen Beautiful Soup because for a while Scrapy didn't support Python 3,
0:50 but now it fully supports Python 3, so that is a really great news, and I had started using Beautiful Soup previously,
0:56 before Scrapy started working with Python3, but Scrapy has actually got some really interesting ways of working
1:02 and you'll see that it can actually grow a little bit farther than just writing, just bringing this package into your code and writing it yourself.
1:10 Scrapy, the founders of Scrapy created this place called Scraping hub which is like web scraping as a service.
1:20 So there is all sorts of retry, cashing, staleness, infrastructure, things that you really got to do to do like large scale web scraping,
1:29 so if that is your goal, check out scraping hub, they've got all of that setup for you,
1:33 and you take the same code that you would write in Scrapy locally, drop it in here and it runs in their infrastructure.
1:39 So that is pretty sweet and I also did an entire episode on screen scraping with the founder of Scraping hub and the creator of Scrapy, Pablo Hoffman,
1:47 so we talked about web scraping, some of the techniques, Scraping hub, some of the rules around this and so on,
1:53 so if you are interested in going deeper on this topic, go ahead and check out talkpython.fm/50.


Talk Python's Mastodon Michael Kennedy's Mastodon