#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 1: parse PyBites RSS feed data

Login or purchase this course to watch this video and the rest of the course contents.
0:01 Let's use feedparser to get the RSS feed of our blog. And I'm not going to do the live feed because I want you to see the same results
0:11 if you go through this exercise. So, I'm going to paste in the actual copy I made. That said, if you do want the live data,
0:19 then just go to our blog, pybit.es, or PyBites. Go to 'view page source' and search for RSS, and we have two feeds: all.atom and all.rss.
0:31 I'm using the latter. The nice thing about feedparser is that you can just call ".parse" and it does a lot of stuff behind it. Let's see what it does.
0:40 Okay, so entries. Blog feed, my variable. And let's look at what "entries" has. And it gives me a lot, so let's look at the first one.
0:52 Actually, if you want to pretty-print this, just do from pprint import pprint, and I usually give it an alias. And now it's spaced out a bit better.
1:06 Look at that, what feedparser did behind the scene. It took that RSS feed and put it in a comprehensible data structure.
1:13 Although this is a nice format, there is still some work to do. For example, the publish date is a string. But we also have publish parse,
1:21 but it's at a time, a struc time. Now, the most convenient way to work with dates is to use datetime.
1:28 So, let's write a helper to convert to a datetime. And I call it just that. It takes a date string. A date string here has some time zone stuff,
1:38 plus zero one. So, the first thing is to strip that off. Just put it here so we can see it while I'm writing this. Date string.
1:55 I can split it on plus and take the first element. We can see this live. And you cannot see this, but there's still
2:12 a pending space here, so it's best to always strip spaces that are not really needed. So, you can see here that it disappeared.
2:22 And then, we do a datetime conversion. And we can do that by strptime. It takes a string, and the only tricky thing is that
2:32 you have to give it the format of the date string. This case, it's a week day, a day, a string month, so like a three-char. month, Jan., Feb., March,
2:43 four digits here, uppercase Y, hour, minute, seconds, and let's see what they'll give me. Okay, the nice thing about a datetime
2:51 is that it actually prints us a string. So it's a bit tricky. And if I look at what the datetime actually is, it's the datetime.
2:59 And that's cool because datetime makes it then very easy to work with dates. For example, let's just return this. So, I'm getting a datetime back.
3:08 What's cool about this is you can now do calculations with datetimes. So, let's make sure that timedelta also here. What if I want to...
3:19 So, this is the seventh of January. But if I want to add like three days, right, I could do datetime + timedelta(days=3) And look at that.
3:31 I just added three days. I mean, you don't even want to imagine doing that on strings, right? It's just not done, and... no. It's totally no way to go.
3:42 So, when you're working with dates, have it in a datetime format. Have it in a standardized way that you can easily do calculations with it.
3:50 And, actually, for this exercise I just want to have the datetime.year. That's another advantage you see here.
3:57 Ones I have the datetime, I can just pull out different elements from that, right? So here, I want the year and the month,
4:03 and I can just access that attribute wise. So, now I just get a string. We will use this later to plot the data.
4:12 The second helper I need is a get category. Takes a link. So, it takes a link, and it extracts the category out of that.
4:22 And we have these known categories, code challenge, new, special, and guest. So, that's the dictionary. The default. should be an article.
4:34 And here I use a bit of regular expressions to pull the category out of the link. A raw string, any characters. A literal .es/
4:49 one or more lower-case letters. + says one or more, and anything after that. Now the parentheses will capture this one or more letters into a match,
5:01 and I can access that in the second argument by the \1. And I'm doing that on link. And then, I can just do a nice get on the dictionary,
5:15 which will look for that category. So it matches code challenge Twitter, special or guest, if it finds it's cool. If not, get will return None.
5:26 And it then goes to the or, which returns default. So this will always return something relevant, right? Or, I find the key in the dictionary.
5:35 If not, I will return default. And that's it. That's the pre-work we are going to do to important helpers. Next up, we will go through the feed data,
5:47 putting it into some useful data structures. And with that second part of the preparation done, the plotting should be easy.


Talk Python's Mastodon Michael Kennedy's Mastodon