Python for Decision Makers and Business Leaders Transcripts
Chapter: Data science in Python
Lecture: Finding the hyperlinks
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Given those HTML blobs, our job is to find the links.
0:04
Let's go and add another markdown section.
0:08
So we want to put a little description here.
0:10
Let's really quickly just print out one of these.
0:16
And we can just take a tiny bit of that
0:18
to see hey, this is what it's going to look like.
0:20
And put that over here into our markdown
0:23
using this little format here.
0:26
And now it'll remind us what we're working with.
0:28
So what we need to do is take this, find these links
0:31
and then from each link
0:32
we want to find the hyperlink part right there
0:34
and then from each hyperlink, we get this.
0:36
And then we're going to be off to the races.
0:39
Just like Feedparser saved the day
0:41
and made it super easy to get this far
0:43
the next thing we're going to use is Beautiful Soup.
0:45
We'll come over here and say all links.
0:48
Going to be a list.
0:49
We're going to build these up and we'll say
0:51
what we want to do is go through each one of these
0:54
we'll say, for d in descriptions, and then get the links.
0:58
And the links are going to be
1:01
first, we're going to create this thing called a soup.
1:02
And we'll say bs4.BeautifulSoup
1:07
and we're going to pass it the description.
1:09
And this is the one line that will take that goo
1:11
and turn it into something we can ask questions about.
1:14
For example, find all the hyperlinks.
1:18
And let's just print out the links.
1:20
Run that real quick. Look at that.
1:22
Hyperlink, hyperlink, hyperlink.
1:25
Close, but notice we have the A and we have the text
1:27
in the middle and so on.
1:28
So we could do one of these little expressions again
1:30
and say what we really want is a
1:34
the href or a, this group.
1:37
We run that again.
1:39
There we go, look we just have each hyperlink here.
1:43
Just the URL that we're looking for.
1:45
Look how easy this is. This is so incredible.
1:47
Now, one more step. Let's go to our all links
1:52
and let's add these to it.
1:54
So, in the end we're going to have them
1:55
not just for this one description
1:57
but for all descriptions.
1:59
And let's do a little print statement.
2:04
So instead of just printing them all out
2:06
we can just say we found however many and let's run that.
2:09
We found 2,824.
2:12
And notice, see there'll be a little star here.
2:14
Watch how long this takes to run.
2:16
And then afterwards.
2:17
But, later if we want to just work with
2:20
like the link, the all links, something like that
2:23
we'd run it, it's instant.
2:25
Because all of this computation here, and here, and here
2:28
this is all just saved up in the notebook.
2:30
And we can just work with the outcome.
2:32
Assuming that this is all done, we'd have to worry about it.
2:35
So really, really effective way to explore this data.