Python for Decision Makers and Business Leaders Transcripts
Chapter: Data science in Python
Lecture: Counting the domains

Login or purchase this course to watch this video and the rest of the course contents.
0:00 We're so close to having the answer.
0:01 We've gone and downloaded the data
0:03 we parsed it apart.
0:04 As XML we started working through it
0:06 and then we said well each one of has embedded html
0:08 which is all sorts of yucky
0:10 but we can use this Beautiful Soup to pull the pieces out.
0:13 And now we've found there's 799 unique domains
0:16 and 2,824 total. What do we do now?
0:20 Well, the last thing to do is figure out
0:21 how many times each one appears.
0:24 That may sound complicated
0:25 and in some languages it is, but watch this.
0:30 We talked about Python's batteries included
0:33 well one of those batteries
0:34 one of those things in the standard library
0:36 is something called the collections module.
0:40 Like this, and it has this thing called a Counter
0:44 and to the counter we can give the things
0:46 we want it, well to count. What are we getting?
0:50 Well it has, oh, looks like it's already got some stuff here
0:52 like GitHub this many and so on
0:54 but what I'd like is to sort it.
0:55 So we can say the most common is going to be counter
1:01 not you guessed it most common.
1:04 And if we just print that out, there you go.
1:06 GitHub 447 references, Twitter 202
1:10 Python Bytes which maybe exclude ourselves maybe not
1:13 YouTube, and so on.
1:15 Maybe we only want the top 25
1:18 'cause you don't want to graph all of them
1:19 you just want to see the most important ones.
1:21 So we can come down here and do one
1:22 literally one more line say give me all the items
1:25 from 0 to 25, and we just show the top 25
1:29 and there they are. Those are the top 25.
1:32 This is the kind of stuff that makes Python so useful
1:36 it's just like a couple of steps
1:38 a couple of lines. You don't...
1:39 There's no algorithmic thinking here
1:41 I don't have to come up with a algorithm
1:43 where I could make mistakes
1:45 or I have to spend time working on it.
1:46 No, I just grab the right thing
1:48 ask the right question, and boom out comes the answer.