Python for Decision Makers and Business Leaders Transcripts
Chapter: Data science in Python
Lecture: From links to domain names
0:00 Now that we have our links what we want is to get the domains. So let's write that here. Now it turns out this is not too hard.
0:12 We're going to need a library that's not super obvious but a little Stack Overflow googling will get you there. So we're going to use urllib.parse
0:22 and what we need is the domains. This is going to be let's just put l for l in all_links for a second and if we print out the domains really quick
0:35 well, that's just these. But what we need to do is convert this thing so we can say this urllib.parse
0:43 yourolivparse and we'll pass at that little thing. Now notice we get a parse result over and over and over but if you look there's a value
0:52 or property net location and that is what we want. So we can say .netlocation or again what there it is. There they all are and we have a duplication.
1:02 We want that duplication because these are multiple references back to the original site and we're going to count how many times each of those appear.
1:12 I guess we'll have as many domains there. I guess we could have some little print out that means somthin' here at least. So we could say how many times
1:23 how many different ones are there? That sounds like it could be challenging. But we can just use what's called a set and a set will take a whole bunch
1:30 of items worth duplication and just get it down to a unique set. So we could say domain and we have to ask how many of those there are
1:38 and we do that like this. Run it again there are799 unique domains. Cool, huh?