#100DaysOfCode in Python Transcripts
Chapter: Days 82-84: Data visualization with Plotly
Lecture: Prep 1: parse PyBites RSS feed data
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
Let's use feedparser to get the
0:03
RSS feed of our blog.
0:04
And I'm not going to do the live feed
0:07
because I want you to see the same results
0:10
if you go through this exercise.
0:12
So, I'm going to paste in the actual copy I made.
0:15
That said, if you do want the live data,
0:18
then just go to our blog, pybit.es, or PyBites.
0:22
Go to 'view page source' and search for RSS,
0:27
and we have two feeds: all.atom and all.rss.
0:30
I'm using the latter.
0:31
The nice thing about feedparser is that
0:34
you can just call ".parse" and it does a lot
0:37
of stuff behind it.
0:38
Let's see what it does.
0:39
Okay, so entries.
0:41
Blog feed, my variable.
0:43
And let's look at what "entries" has.
0:45
And it gives me a lot,
0:47
so let's look at the first one.
0:51
Actually, if you want to pretty-print this, just do from
0:57
pprint import pprint,
0:59
and I usually give it an alias.
1:03
And now it's spaced out a bit better.
1:05
Look at that, what feedparser did
1:06
behind the scene.
1:07
It took that RSS feed and put it in
1:09
a comprehensible data structure.
1:12
Although this is a nice format,
1:14
there is still some work to do.
1:15
For example, the publish date is a string.
1:18
But we also have publish parse,
1:20
but it's at a time, a struc time.
1:22
Now, the most convenient way to work with dates
1:25
is to use datetime.
1:27
So, let's write a helper to convert to a datetime.
1:31
And I call it just that.
1:32
It takes a date string.
1:34
A date string here has some time zone stuff,
1:37
plus zero one.
1:40
So, the first thing is to strip that off.
1:45
Just put it here so we can see it while I'm writing this.
1:52
Date string.
1:54
I can split it on plus and take the first element.
2:00
We can see this live.
2:09
And you cannot see this, but there's still
2:11
a pending space here, so it's best to always
2:15
strip spaces that are not really needed.
2:19
So, you can see here that it disappeared.
2:21
And then, we do a datetime conversion.
2:24
And we can do that by strptime.
2:28
It takes a string, and the only tricky thing is that
2:31
you have to give it the format of the date string.
2:35
This case, it's a week day, a day,
2:37
a string month, so like a three-char. month,
2:40
Jan., Feb., March,
2:42
four digits here, uppercase Y,
2:45
hour, minute, seconds,
2:47
and let's see what they'll give me.
2:48
Okay, the nice thing about a datetime
2:50
is that it actually prints us a string.
2:52
So it's a bit tricky.
2:53
And if I look at what the datetime actually is,
2:57
it's the datetime.
2:58
And that's cool because datetime makes it then
3:00
very easy to work with dates.
3:02
For example, let's just return this.
3:06
So, I'm getting a datetime back.
3:07
What's cool about this is you can now
3:09
do calculations with datetimes.
3:13
So, let's make sure that
3:14
timedelta also here.
3:16
What if I want to...
3:18
So, this is the seventh of January.
3:20
But if I want to add like three days, right,
3:23
I could do datetime + timedelta(days=3)
3:29
And look at that.
3:30
I just added three days.
3:32
I mean, you don't even want to imagine
3:34
doing that on strings, right?
3:36
It's just not done, and... no.
3:39
It's totally no way to go.
3:41
So, when you're working with dates,
3:43
have it in a datetime format.
3:44
Have it in a standardized way that you can
3:47
easily do calculations with it.
3:49
And, actually, for this exercise
3:51
I just want to have the datetime.year.
3:53
That's another advantage you see here.
3:56
Ones I have the datetime, I can just pull out
3:58
different elements from that, right?
4:00
So here, I want the year and the month,
4:02
and I can just access that attribute wise.
4:05
So, now I just get a string.
4:08
We will use this later to plot the data.
4:11
The second helper I need is a get category.
4:14
Takes a link.
4:18
So, it takes a link, and it extracts the category
4:20
out of that.
4:21
And we have these known categories,
4:23
code challenge, new, special, and guest.
4:26
So, that's the dictionary.
4:28
The default. should be an article.
4:33
And here I use a bit of regular expressions
4:36
to pull the category out of the link.
4:39
A raw string, any characters.
4:42
A literal .es/
4:48
one or more lower-case letters.
4:51
+ says one or more, and anything after that.
4:55
Now the parentheses will capture this
4:58
one or more letters into a match,
5:00
and I can access that in the second argument
5:03
by the \1.
5:06
And I'm doing that on link.
5:09
And then, I can just do a nice get on the dictionary,
5:14
which will look for that category.
5:18
So it matches code challenge Twitter,
5:20
special or guest,
5:21
if it finds it's cool.
5:23
If not, get will return None.
5:25
And it then goes to the or,
5:27
which returns default.
5:29
So this will always return something relevant, right?
5:31
Or, I find the key in the dictionary.
5:34
If not, I will return default.
5:37
And that's it.
5:38
That's the pre-work we are going to do
5:41
to important helpers.
5:42
Next up, we will go through the feed data,
5:46
putting it into some useful data structures.
5:49
And with that second part of the preparation done,
5:51
the plotting should be easy.