Python for decision makers and business leaders Transcripts
Chapter: Data science in Python
Lecture: Finding the hyperlinks
0:00 Given those HTML blobs, our job is to find the links.
0:04 Let's go and add another markdown section.
0:08 So we want to put a little description here.
0:10 Let's really quickly just print out one of these.
0:16 And we can just take a tiny bit of that
0:18 to see hey, this is what it's going to look like.
0:20 And put that over here into our markdown
0:23 using this little format here.
0:26 And now it'll remind us what we're working with.
0:28 So what we need to do is take this, find these links
0:31 and then from each link
0:32 we want to find the hyperlink part right there
0:34 and then from each hyperlink, we get this.
0:36 And then we're going to be off to the races.
0:39 Just like Feedparser saved the day
0:41 and made it super easy to get this far
0:43 the next thing we're going to use is Beautiful Soup.
0:45 We'll come over here and say all links.
0:48 Going to be a list.
0:49 We're going to build these up and we'll say
0:51 what we want to do is go through each one of these
0:54 we'll say, for d in descriptions, and then get the links.
0:58 And the links are going to be
1:01 first, we're going to create this thing called a soup.
1:02 And we'll say bs4.BeautifulSoup
1:07 and we're going to pass it the description.
1:09 And this is the one line that will take that goo
1:11 and turn it into something we can ask questions about.
1:14 For example, find all the hyperlinks.
1:18 And let's just print out the links.
1:20 Run that real quick. Look at that.
1:22 Hyperlink, hyperlink, hyperlink.
1:25 Close, but notice we have the A and we have the text
1:27 in the middle and so on.
1:28 So we could do one of these little expressions again
1:30 and say what we really want is a
1:34 the href or a, this group.
1:37 We run that again.
1:39 There we go, look we just have each hyperlink here.
1:43 Just the URL that we're looking for.
1:45 Look how easy this is. This is so incredible.
1:47 Now, one more step. Let's go to our all links
1:52 and let's add these to it.
1:54 So, in the end we're going to have them
1:55 not just for this one description
1:57 but for all descriptions.
1:59 And let's do a little print statement.
2:04 So instead of just printing them all out
2:06 we can just say we found however many and let's run that.
2:09 We found 2,824.
2:12 And notice, see there'll be a little star here.
2:14 Watch how long this takes to run.
2:16 And then afterwards.
2:17 But, later if we want to just work with
2:20 like the link, the all links, something like that
2:23 we'd run it, it's instant.
2:25 Because all of this computation here, and here, and here
2:28 this is all just saved up in the notebook.
2:30 And we can just work with the outcome.
2:32 Assuming that this is all done, we'd have to worry about it.
2:35 So really, really effective way to explore this data.