#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 2: useful data structures for plotting
0:00 So that was some prework, but I have not spoke about
0:03 yet is what are we going to plot.
0:05 And there are three graphs I want to make.
0:08 First, I want to make a bar chart of our
0:10 posting activity, what months did we put more content out,
0:13 and what months less.
0:15 Secondly, I want a pie chart of breakdown of our categories,
0:19 so each blog post has one category associated,
0:22 and that will show us what we blog about most.
0:26 Similarly, that also is true for tags,
0:29 tags give an indication what we blog about.
0:31 But we use more tags than categories.
0:34 One blog post can be a ten tags,
0:37 so it's a bit more granular.
0:38 So it still will be another angle,
0:41 or another inside into our data.
0:43 Next up, there are three exercises to get the data
0:47 into a format that I can easily make those three graphs.
0:52 So first of all, I want to have the published entries.
0:56 So I'm going to use Counter,
0:58 to count all the entries by year, month.
1:01 And here's where the helper comes in,
1:03 because we're going to use a list comprehension,
1:07 we've dealt with in, I believe Day 16,
1:10 and we can say pub dates,
1:14 and make a list comprehension.
1:17 And, I can just say for entry in entries,
1:22 and entries is our complete RSS feed broken down
1:26 into nice entries by feedparser.
1:29 And we saw an entry here, laid out.
1:31 So I'm going to look over these entries
1:33 and for every entry, here's the helper,
1:36 I'm going to convert to datetime,
1:39 entry, and I'll take the published fields
1:42 and what's funny, I'm actually going to
1:43 prepare to those two, using the dictionary way.
1:46 But I should actually be able to do a dot notation
1:50 which is much nicer.
1:52 I put that into convert to datetime,
1:54 and convert to datetime, it's actually not
1:57 100 percent accurate.
1:59 It's more like, I mean that was the initial intent,
2:02 but let's actually call
2:04 it date, year, month.
2:10 Because that's actually what it's returning, right?
2:12 So we should make our functions descriptive.
2:16 And, yeah let's give the first five to see if
2:20 I'm going in the right direction.
2:22 And I am.
2:23 And the nice thing about Counter as we've seen in day four
2:28 in the collections module lesson,
2:30 is that I can give it a list of items,
2:33 and it just does a count.
2:34 So if I want to have posts by month,
2:38 so counter can just get this pub dates list,
2:42 and look what happens.
2:44 Wow. Boom.
2:45 I mean I didn't have to keep track of,
2:47 well we saw that in the previous lesson right,
2:49 they can hide it in a manual loophole
2:51 for all the items, keep in account and etc.
2:54 But this is all done, understand the library.
2:57 Secondly, we need to break down the categories.
3:00 So, similar as list comprehension, we're
3:04 going to look over the entries.
3:05 But instead of getting your month,
3:08 I'm going to use the other helper we defined
3:09 and just get category.
3:11 And I'm going to do that on the link.
3:13 And those are not pub dates, those are categories.
3:17 Again, counter is your best friend.
3:26 Tags is almost the same, so I'm going to just copy it over.
3:30 Tags, that is actually a bit more complex.
3:32 Let me go from start, so for entry and entries,
3:36 and here I have an exceptional case
3:38 for a nested for list comprehension.
3:42 For each entry, loop through the tag.
3:46 And each tie has a term, let's lower case
3:49 that to not have to deal with upper and lower case.
3:52 So for each entry, because one entry has a list of tags,
3:56 I'm looping through this list of tags,
3:58 and I'm taking out the term.
4:00 That's what I'm basically doing.
4:01 I lower case that tag, so we have all the tags,
4:04 for all the entries.
4:05 And again, I can use a counter to get that all counted up.
4:09 Let's give most common a limitation of 20,
4:13 and let's print the first five.
4:17 And obviously five then is at the top.
4:19 Right, that was a lot of preparation but the good news,
4:23 is that the data is now in a structure that
4:25 we can easily make plots.