#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 2: useful data structures for plotting
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
So that was some prework, but I have not spoke about
0:03
yet is what are we going to plot.
0:05
And there are three graphs I want to make.
0:08
First, I want to make a bar chart of our
0:10
posting activity, what months did we put more content out,
0:13
and what months less.
0:15
Secondly, I want a pie chart of breakdown of our categories,
0:19
so each blog post has one category associated,
0:22
and that will show us what we blog about most.
0:26
Similarly, that also is true for tags,
0:29
tags give an indication what we blog about.
0:31
But we use more tags than categories.
0:34
One blog post can be a ten tags,
0:37
so it's a bit more granular.
0:38
So it still will be another angle,
0:41
or another inside into our data.
0:43
Next up, there are three exercises to get the data
0:47
into a format that I can easily make those three graphs.
0:52
So first of all, I want to have the published entries.
0:56
So I'm going to use Counter,
0:58
to count all the entries by year, month.
1:01
And here's where the helper comes in,
1:03
because we're going to use a list comprehension,
1:07
we've dealt with in, I believe Day 16,
1:10
and we can say pub dates,
1:14
and make a list comprehension.
1:17
And, I can just say for entry in entries,
1:22
and entries is our complete RSS feed broken down
1:26
into nice entries by feedparser.
1:29
And we saw an entry here, laid out.
1:31
So I'm going to look over these entries
1:33
and for every entry, here's the helper,
1:36
I'm going to convert to datetime,
1:39
entry, and I'll take the published fields
1:42
and what's funny, I'm actually going to
1:43
prepare to those two, using the dictionary way.
1:46
But I should actually be able to do a dot notation
1:50
which is much nicer.
1:52
I put that into convert to datetime,
1:54
and convert to datetime, it's actually not
1:57
100 percent accurate.
1:59
It's more like, I mean that was the initial intent,
2:02
but let's actually call
2:04
it date, year, month.
2:10
Because that's actually what it's returning, right?
2:12
So we should make our functions descriptive.
2:16
And, yeah let's give the first five to see if
2:20
I'm going in the right direction.
2:22
And I am.
2:23
And the nice thing about Counter as we've seen in day four
2:28
in the collections module lesson,
2:30
is that I can give it a list of items,
2:33
and it just does a count.
2:34
So if I want to have posts by month,
2:38
so counter can just get this pub dates list,
2:42
and look what happens.
2:44
Wow. Boom.
2:45
I mean I didn't have to keep track of,
2:47
well we saw that in the previous lesson right,
2:49
they can hide it in a manual loophole
2:51
for all the items, keep in account and etc.
2:54
But this is all done, understand the library.
2:57
Secondly, we need to break down the categories.
3:00
So, similar as list comprehension, we're
3:04
going to look over the entries.
3:05
But instead of getting your month,
3:08
I'm going to use the other helper we defined
3:09
and just get category.
3:11
And I'm going to do that on the link.
3:13
And those are not pub dates, those are categories.
3:17
Again, counter is your best friend.
3:26
Tags is almost the same, so I'm going to just copy it over.
3:30
Tags, that is actually a bit more complex.
3:32
Let me go from start, so for entry and entries,
3:36
and here I have an exceptional case
3:38
for a nested for list comprehension.
3:42
For each entry, loop through the tag.
3:46
And each tie has a term, let's lower case
3:49
that to not have to deal with upper and lower case.
3:52
So for each entry, because one entry has a list of tags,
3:56
I'm looping through this list of tags,
3:58
and I'm taking out the term.
4:00
That's what I'm basically doing.
4:01
I lower case that tag, so we have all the tags,
4:04
for all the entries.
4:05
And again, I can use a counter to get that all counted up.
4:09
Let's give most common a limitation of 20,
4:13
and let's print the first five.
4:17
And obviously five then is at the top.
4:19
Right, that was a lot of preparation but the good news,
4:23
is that the data is now in a structure that
4:25
we can easily make plots.