Python for .NET Developers Transcripts
Chapter: Computational notebooks
Lecture: Counting domains

0:00 Alright, we have our domains here. Maybe we'll go ahead and change this to say something, really quick. First 10, domains are...

0:09 here we go, first 10 domains are like that. Let's add some of them below, some markdown. Now we're going to write some code

0:16 and I think this will impress you. I'm pretty sure. It definitely impresses me when I first learned it. So here's what we want to do.

0:22 I want to go through that list, find all the unique names I want to find that, and I want to find that, and so on.

0:29 Then I want to count how many there are. Then I want to sort them by the most common first. Give me that name and the count.

0:36 And then the second most common then the third most common, and so on. So there's a cool library called collections

0:41 so we can say, from collections import counter. And we can say, the counter is going to be a counter of these domain names.

0:50 And then we can ask it questions like give me the most common. And what that does is basically gives us a list of these things.

0:58 Say top 25, is going to be common, up to 25. Why is it going to work? Because this is sorted as I described it most popular to least popular.

1:08 Then we can just print, Top 25. Are you ready for this? Look how little code this is. Boom, actually let's put it out like this.

1:15 I think we'll see it better. There we go, I like the way that looks better. We could do better pretty printing but you know what, we got this covered.

1:22 Look at that. 382. 153. 64, and so on. That's it. We've gone and found the popular domains. GitHub, Twitter, and YouTube, apparently

1:34 as well as Python.org, medium Reddit. Maybe we want to exclude referring back to ourselves. So we can come over here and we can do some cool trick

1:42 with that, we call it excluded. It's going to be a set like that and who knows what else. Empty string, possibly, hash.

1:52 Here we could say, if link not in that. Oh, whoops, we need to... Probably the best way to parse it is like this.

2:01 D4D in domains if D is not an excluded one. I'm going to run down here, see if this Python by 32 should go away. And it does, because it's excluded.

2:12 We're like, hey, we don't want to count referring to ourself. That's kind of weird. No, we're not taking credit for that. So here we have it.

2:17 These are the top 25 domains. And as I look at this, I'm kind of feeling like this unique bit that we were doing.

2:25 I don't think I want to do that anymore. Because we might refer to a project five times and we'll just go back.

2:32 I think this is still probably going to be exactly the same issue. So many thousand, and we just need to change that little bit right there.

2:41 Rerun it again. Ah, guess we got, one more to get rid of. First 10 domains look good again. There we go, and notice we're pointing to GitHub a lot more

2:54 Twitter a lot more, that's because there are projects that are popular, and I want to count those. All right, that's it.

3:00 So a lot of looking at the data, thinking about it bumping around back and forth and playing with it. But in order to go from our domains

3:07 to what ones are popular, it's ridiculous, right?

Python for .NET Developers Transcripts Chapter: Computational notebooks Lecture: Counting domains

Python for .NET Developers Transcripts
Chapter: Computational notebooks
Lecture: Counting domains