Python for Decision Makers and Business Leaders Transcripts
Chapter: Data science in Python
Lecture: Counting the domains
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
We're so close to having the answer.
0:01
We've gone and downloaded the data
0:03
we parsed it apart.
0:04
As XML we started working through it
0:06
and then we said well each one of has embedded html
0:08
which is all sorts of yucky
0:10
but we can use this Beautiful Soup to pull the pieces out.
0:13
And now we've found there's 799 unique domains
0:16
and 2,824 total. What do we do now?
0:20
Well, the last thing to do is figure out
0:21
how many times each one appears.
0:24
That may sound complicated
0:25
and in some languages it is, but watch this.
0:30
We talked about Python's batteries included
0:33
well one of those batteries
0:34
one of those things in the standard library
0:36
is something called the collections module.
0:40
Like this, and it has this thing called a Counter
0:44
and to the counter we can give the things
0:46
we want it, well to count. What are we getting?
0:50
Well it has, oh, looks like it's already got some stuff here
0:52
like GitHub this many and so on
0:54
but what I'd like is to sort it.
0:55
So we can say the most common is going to be counter
1:01
not you guessed it most common.
1:04
And if we just print that out, there you go.
1:06
GitHub 447 references, Twitter 202
1:10
Python Bytes which maybe exclude ourselves maybe not
1:13
YouTube, Python.org and so on.
1:15
Maybe we only want the top 25
1:18
'cause you don't want to graph all of them
1:19
you just want to see the most important ones.
1:21
So we can come down here and do one
1:22
literally one more line say give me all the items
1:25
from 0 to 25, and we just show the top 25
1:29
and there they are. Those are the top 25.
1:32
This is the kind of stuff that makes Python so useful
1:36
it's just like a couple of steps
1:38
a couple of lines. You don't...
1:39
There's no algorithmic thinking here
1:41
I don't have to come up with a algorithm
1:43
where I could make mistakes
1:45
or I have to spend time working on it.
1:46
No, I just grab the right thing
1:48
ask the right question, and boom out comes the answer.