Python for the .NET Developer Transcripts
Chapter: Computational notebooks
Lecture: From links to domains
0:00 Now that we have our unique links let's parse out the domain names because things like GitHub and Twitter and Reddit
0:08 all that kind of stuff is going to appear many times we want to know how many times each one appears. So, B to add another code block here.
0:17 And we're going to use this thing called urllib which is built in to Python and we're going to use this URL parse.
0:26 We'll say the domains are URL parse, hit Tab. And that's going to be link for link and what unique links, like so.
0:38 Now we don't want this, this is going to give us an object. What we want is net location, like so. And let's just do a little exploration
0:46 let's print out domains first ten, something like that. You know what those look like? Those just look like broken, broken links.
0:56 My goodness, so we have this looks like I got to deal with it later. So one, two, three that's the first three after we sorted, are broken, so.
1:04 Let's do this. Three onward, run that one. There we go, fixed, like a charm. I don't know what's going on
1:14 we must have just typed in some bad markdown along the way. We did type 2.5 megabytes of text so I guess that generates a few errors.
1:22 Anyway, we we're able too see that really quick and just go back and change that here. You might want to change this in the future
1:29 cause I'm going to go back and fix that on the site probably. None the less, here we have our domains and let's just print out the first ten.
1:36 There, that's more like what I was expecting. We're getting Pycon.de, Python weekly, aka dot ms. I'm tempted to replace that with Microsoft.com
1:46 but it could redirect via Microsoft to somewhere else so I'm just going to leave it like that. Amazon, Amazon, Amazon.
1:52 Now, we want the duplication in this list, at the moment. We don't want duplication in the links, necessarily. I guess maybe, maybe we do.
2:02 But probably we don't, right? We probably just want to say what are all the things that we pointed at
2:06 and then how many of them are from any given domain. So the point is that we want to say well Amazon is more popular than the others
2:13 because there is three links to Amazon and only one to Python weekly. So we want this duplication, this is not a problem
2:20 this is actually the essential part of what we are trying to work with here. So, there we have it, it wasn't a lot of work
2:25 it was a lot of talking and like looking at the data but yeah, not so much work, right. Just a tiny bit of code.
2:32 You'll notice we're using a lot of these list comprehensions and other clever little programming techniques
2:38 that we can write the minimal amount of code like this could be three loops, or it could just be that it could even be less if we did it
2:44 it we change it I'm sure but none the less we want to have it really focus small bits of code some explanation, maybe a picture
2:51 we're not there yet, but we're getting close as we go through it. So this is the notebook style.