#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 1: parse PyBites RSS feed data
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
Let's use feedparser to get the RSS feed of our blog. And I'm not going to do the live feed because I want you to see the same results
0:11
if you go through this exercise. So, I'm going to paste in the actual copy I made. That said, if you do want the live data,
0:19
then just go to our blog, pybit.es, or PyBites. Go to 'view page source' and search for RSS, and we have two feeds: all.atom and all.rss.
0:31
I'm using the latter. The nice thing about feedparser is that you can just call ".parse" and it does a lot of stuff behind it. Let's see what it does.
0:40
Okay, so entries. Blog feed, my variable. And let's look at what "entries" has. And it gives me a lot, so let's look at the first one.
0:52
Actually, if you want to pretty-print this, just do from pprint import pprint, and I usually give it an alias. And now it's spaced out a bit better.
1:06
Look at that, what feedparser did behind the scene. It took that RSS feed and put it in a comprehensible data structure.
1:13
Although this is a nice format, there is still some work to do. For example, the publish date is a string. But we also have publish parse,
1:21
but it's at a time, a struc time. Now, the most convenient way to work with dates is to use datetime.
1:28
So, let's write a helper to convert to a datetime. And I call it just that. It takes a date string. A date string here has some time zone stuff,
1:38
plus zero one. So, the first thing is to strip that off. Just put it here so we can see it while I'm writing this. Date string.
1:55
I can split it on plus and take the first element. We can see this live. And you cannot see this, but there's still
2:12
a pending space here, so it's best to always strip spaces that are not really needed. So, you can see here that it disappeared.
2:22
And then, we do a datetime conversion. And we can do that by strptime. It takes a string, and the only tricky thing is that
2:32
you have to give it the format of the date string. This case, it's a week day, a day, a string month, so like a three-char. month, Jan., Feb., March,
2:43
four digits here, uppercase Y, hour, minute, seconds, and let's see what they'll give me. Okay, the nice thing about a datetime
2:51
is that it actually prints us a string. So it's a bit tricky. And if I look at what the datetime actually is, it's the datetime.
2:59
And that's cool because datetime makes it then very easy to work with dates. For example, let's just return this. So, I'm getting a datetime back.
3:08
What's cool about this is you can now do calculations with datetimes. So, let's make sure that timedelta also here. What if I want to...
3:19
So, this is the seventh of January. But if I want to add like three days, right, I could do datetime + timedelta(days=3) And look at that.
3:31
I just added three days. I mean, you don't even want to imagine doing that on strings, right? It's just not done, and... no. It's totally no way to go.
3:42
So, when you're working with dates, have it in a datetime format. Have it in a standardized way that you can easily do calculations with it.
3:50
And, actually, for this exercise I just want to have the datetime.year. That's another advantage you see here.
3:57
Ones I have the datetime, I can just pull out different elements from that, right? So here, I want the year and the month,
4:03
and I can just access that attribute wise. So, now I just get a string. We will use this later to plot the data.
4:12
The second helper I need is a get category. Takes a link. So, it takes a link, and it extracts the category out of that.
4:22
And we have these known categories, code challenge, new, special, and guest. So, that's the dictionary. The default. should be an article.
4:34
And here I use a bit of regular expressions to pull the category out of the link. A raw string, any characters. A literal .es/
4:49
one or more lower-case letters. + says one or more, and anything after that. Now the parentheses will capture this one or more letters into a match,
5:01
and I can access that in the second argument by the \1. And I'm doing that on link. And then, I can just do a nice get on the dictionary,
5:15
which will look for that category. So it matches code challenge Twitter, special or guest, if it finds it's cool. If not, get will return None.
5:26
And it then goes to the or, which returns default. So this will always return something relevant, right? Or, I find the key in the dictionary.
5:35
If not, I will return default. And that's it. That's the pre-work we are going to do to important helpers. Next up, we will go through the feed data,
5:47
putting it into some useful data structures. And with that second part of the preparation done, the plotting should be easy.