Python for Decision Makers and Business Leaders Transcripts
Chapter: Data science in Python
Lecture: Finding the hyperlinks
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Given those HTML blobs, our job is to find the links. Let's go and add another markdown section. So we want to put a little description here.
0:11
Let's really quickly just print out one of these. And we can just take a tiny bit of that to see hey, this is what it's going to look like.
0:21
And put that over here into our markdown using this little format here. And now it'll remind us what we're working with.
0:29
So what we need to do is take this, find these links and then from each link we want to find the hyperlink part right there
0:35
and then from each hyperlink, we get this. And then we're going to be off to the races. Just like Feedparser saved the day
0:42
and made it super easy to get this far the next thing we're going to use is Beautiful Soup. We'll come over here and say all links. Going to be a list.
0:50
We're going to build these up and we'll say what we want to do is go through each one of these
0:55
we'll say, for d in descriptions, and then get the links. And the links are going to be first, we're going to create this thing called a soup.
1:03
And we'll say bs4.BeautifulSoup and we're going to pass it the description. And this is the one line that will take that goo
1:12
and turn it into something we can ask questions about. For example, find all the hyperlinks. And let's just print out the links.
1:21
Run that real quick. Look at that. Hyperlink, hyperlink, hyperlink. Close, but notice we have the A and we have the text in the middle and so on.
1:29
So we could do one of these little expressions again and say what we really want is a the href or a, this group. We run that again.
1:40
There we go, look we just have each hyperlink here. Just the URL that we're looking for. Look how easy this is. This is so incredible.
1:48
Now, one more step. Let's go to our all links and let's add these to it. So, in the end we're going to have them not just for this one description
1:58
but for all descriptions. And let's do a little print statement. So instead of just printing them all out
2:07
we can just say we found however many and let's run that. We found 2,824. And notice, see there'll be a little star here.
2:15
Watch how long this takes to run. And then afterwards. But, later if we want to just work with like the link, the all links, something like that
2:24
we'd run it, it's instant. Because all of this computation here, and here, and here this is all just saved up in the notebook.
2:31
And we can just work with the outcome. Assuming that this is all done, we'd have to worry about it.
2:36
So really, really effective way to explore this data.