#100DaysOfCode in Python Transcripts
Chapter: Days 46-48: Web Scraping with BeautifulSoup4
Lecture: A quick BS4 overview
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
I just thought I'd give a quick overview for anyone who hasn't dealt with Beautiful Soup 4 before. So if you haven't, feel free to keep watching,
0:10
but if you have, skip on over because I'm just going to be repeating myself. Now, Beautiful Soup 4 allows you to parse web pages.
0:19
Okay, we've all dealt with requests by now and we know that we're using requests, we can pull down the code behind their web page, right?
0:28
And we can then use Beautiful Soup 4 to parse that data. All right, I'll tell you what I mean. So, let's view the page source
0:37
for our PyBites Code Challenges page. And here you'll see all of your HTML. Now, if I wanted specifically to get all of our code challenge names
0:51
and just put them in a list to make in some sort of an email or whatever application I can think of, right. Well, how am I going to do that?
1:00
If you go into the page source, you need to find, the first thing you need to do is you need to find that data in the code and here it is.
1:09
It's an unordered list with ID of article list, and a class of article list. Okay, and then all of our different challenge headers
1:19
are stored in list elements, okay? Now, with that information in hand, we can then use Beautiful Soup 4 to search this page,
1:30
remembering that requests will pretty much pull down this page looking like this, and Beautiful Soup 4 will parse that,
1:38
and we can then tell it what to look for. And now you can start thinking, imagining the cool things you can do with this.
1:45
So, we can skip all of this junk up here, all of this code that we don't care about, and drill straight down to the list that we do care about.
1:55
And you can use this on any site that you can think of. You can search by all sorts of different criteria.
2:01
And we're going to show that in the next video. So, get excited, because this is really, really fun stuff.