#100DaysOfCode in Python Transcripts
Chapter: Days 46-48: Web Scraping with BeautifulSoup4
Lecture: A quick BS4 overview
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
I just thought I'd give a quick overview
0:02
for anyone who hasn't dealt with Beautiful Soup 4 before.
0:06
So if you haven't, feel free to keep watching,
0:09
but if you have,
0:10
skip on over because I'm just going to be repeating myself.
0:13
Now, Beautiful Soup 4 allows you to parse web pages.
0:18
Okay, we've all dealt with requests by now
0:21
and we know that we're using requests,
0:23
we can pull down the code behind their web page, right?
0:27
And we can then use Beautiful Soup 4
0:30
to parse that data.
0:32
All right, I'll tell you what I mean.
0:34
So, let's view the page source
0:36
for our PyBites Code Challenges page.
0:40
And here you'll see all of your HTML.
0:44
Now, if I wanted specifically
0:46
to get all of our code challenge names
0:50
and just put them in a list to make
0:52
in some sort of an email
0:54
or whatever application I can think of, right.
0:57
Well, how am I going to do that?
0:59
If you go into the page source,
1:01
you need to find,
1:02
the first thing you need to do
1:03
is you need to find that data in the code
1:07
and here it is.
1:08
It's an unordered list
1:09
with ID of article list, and a class of article list.
1:13
Okay, and then all of our different challenge headers
1:18
are stored in list elements, okay?
1:22
Now, with that information in hand,
1:25
we can then use Beautiful Soup 4 to search this page,
1:29
remembering that requests will pretty much pull down
1:32
this page looking like this,
1:34
and Beautiful Soup 4 will parse that,
1:37
and we can then tell it what to look for.
1:40
And now you can start thinking,
1:42
imagining the cool things you can do with this.
1:44
So, we can skip all of this junk up here,
1:48
all of this code that we don't care about,
1:50
and drill straight down to the list that we do care about.
1:54
And you can use this on any site that you can think of.
1:57
You can search by all sorts of different criteria.
2:00
And we're going to show that in the next video.
2:04
So, get excited, because this is really, really fun stuff.