#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 1: parse PyBites RSS feed data
0:01 Let's use feedparser to get the
0:03 RSS feed of our blog.
0:04 And I'm not going to do the live feed
0:07 because I want you to see the same results
0:10 if you go through this exercise.
0:12 So, I'm going to paste in the actual copy I made.
0:15 That said, if you do want the live data,
0:18 then just go to our blog, pybit.es, or PyBites.
0:22 Go to 'view page source' and search for RSS,
0:27 and we have two feeds: all.atom and all.rss.
0:30 I'm using the latter.
0:31 The nice thing about feedparser is that
0:34 you can just call ".parse" and it does a lot
0:37 of stuff behind it.
0:38 Let's see what it does.
0:39 Okay, so entries.
0:41 Blog feed, my variable.
0:43 And let's look at what "entries" has.
0:45 And it gives me a lot,
0:47 so let's look at the first one.
0:51 Actually, if you want to pretty-print this, just do from
0:57 pprint import pprint,
0:59 and I usually give it an alias.
1:03 And now it's spaced out a bit better.
1:05 Look at that, what feedparser did
1:06 behind the scene.
1:07 It took that RSS feed and put it in
1:09 a comprehensible data structure.
1:12 Although this is a nice format,
1:14 there is still some work to do.
1:15 For example, the publish date is a string.
1:18 But we also have publish parse,
1:20 but it's at a time, a struc time.
1:22 Now, the most convenient way to work with dates
1:25 is to use datetime.
1:27 So, let's write a helper to convert to a datetime.
1:31 And I call it just that.
1:32 It takes a date string.
1:34 A date string here has some time zone stuff,
1:37 plus zero one.
1:40 So, the first thing is to strip that off.
1:45 Just put it here so we can see it while I'm writing this.
1:52 Date string.
1:54 I can split it on plus and take the first element.
2:00 We can see this live.
2:09 And you cannot see this, but there's still
2:11 a pending space here, so it's best to always
2:15 strip spaces that are not really needed.
2:19 So, you can see here that it disappeared.
2:21 And then, we do a datetime conversion.
2:24 And we can do that by strptime.
2:28 It takes a string, and the only tricky thing is that
2:31 you have to give it the format of the date string.
2:35 This case, it's a week day, a day,
2:37 a string month, so like a three-char. month,
2:40 Jan., Feb., March,
2:42 four digits here, uppercase Y,
2:45 hour, minute, seconds,
2:47 and let's see what they'll give me.
2:48 Okay, the nice thing about a datetime
2:50 is that it actually prints us a string.
2:52 So it's a bit tricky.
2:53 And if I look at what the datetime actually is,
2:57 it's the datetime.
2:58 And that's cool because datetime makes it then
3:00 very easy to work with dates.
3:02 For example, let's just return this.
3:06 So, I'm getting a datetime back.
3:07 What's cool about this is you can now
3:09 do calculations with datetimes.
3:13 So, let's make sure that
3:14 timedelta also here.
3:16 What if I want to...
3:18 So, this is the seventh of January.
3:20 But if I want to add like three days, right,
3:23 I could do datetime + timedelta(days=3)
3:29 And look at that.
3:30 I just added three days.
3:32 I mean, you don't even want to imagine
3:34 doing that on strings, right?
3:36 It's just not done, and... no.
3:39 It's totally no way to go.
3:41 So, when you're working with dates,
3:43 have it in a datetime format.
3:44 Have it in a standardized way that you can
3:47 easily do calculations with it.
3:49 And, actually, for this exercise
3:51 I just want to have the datetime.year.
3:53 That's another advantage you see here.
3:56 Ones I have the datetime, I can just pull out
3:58 different elements from that, right?
4:00 So here, I want the year and the month,
4:02 and I can just access that attribute wise.
4:05 So, now I just get a string.
4:08 We will use this later to plot the data.
4:11 The second helper I need is a get category.
4:14 Takes a link.
4:18 So, it takes a link, and it extracts the category
4:20 out of that.
4:21 And we have these known categories,
4:23 code challenge, new, special, and guest.
4:26 So, that's the dictionary.
4:28 The default. should be an article.
4:33 And here I use a bit of regular expressions
4:36 to pull the category out of the link.
4:39 A raw string, any characters.
4:42 A literal .es/
4:48 one or more lower-case letters.
4:51 + says one or more, and anything after that.
4:55 Now the parentheses will capture this
4:58 one or more letters into a match,
5:00 and I can access that in the second argument
5:03 by the \1.
5:06 And I'm doing that on link.
5:09 And then, I can just do a nice get on the dictionary,
5:14 which will look for that category.
5:18 So it matches code challenge Twitter,
5:20 special or guest,
5:21 if it finds it's cool.
5:23 If not, get will return None.
5:25 And it then goes to the or,
5:27 which returns default.
5:29 So this will always return something relevant, right?
5:31 Or, I find the key in the dictionary.
5:34 If not, I will return default.
5:37 And that's it.
5:38 That's the pre-work we are going to do
5:41 to important helpers.
5:42 Next up, we will go through the feed data,
5:46 putting it into some useful data structures.
5:49 And with that second part of the preparation done,
5:51 the plotting should be easy.