Python for .NET Developers Transcripts
Chapter: Computational notebooks
Lecture: Getting all the RSS entries
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Alright, let's add a few more dependencies. Were going to need these later and we can go ahead and install those. Let's also go and add.
0:09
We'll put one of these up here and then, change this from code to markdown and I'll put some markdown code in here
0:16
Python Bytes, domain reference, like this. I'm going to say Python Bytes domain reference authority. If we run it, it turns into proper markdown.
0:26
We can put markdown below tech, code, images all sorts of stuff are going to help us tell the story.
0:33
Over here, we can add, I guess we can add one more. This is going to be the same markdown, but a subheading
0:40
and let's just say get the RSS speed data as a dictionary. 'Kay, so that's what we are going to do right here. It'll help us sort of talk about it.
0:50
We'll put some pictures and we'll go back and forth. Now, there's really a couple interesting things to do.
0:55
First of all, notice that we can run these over and over and once you run them, now see it's got a star and then, it's got an output.
1:02
If we have another one over here I could do something like I want to print out the size of downloaded data. Here we go, we'll do digit grouping
1:16
and like this maybe if we divided by 1,024 1,024 and put megabytes. We'll see how much data it turns out that this is.
1:27
We downloaded, actually let's put. So there's something really interesting happened here. Did you see that this took a while to run
1:33
and that it has a five and this has a seven? And I can put exclamation mark here 'cause we are so excited and watch how quick it runs.
1:40
Instantly. These cells run independently. They're not like a separate program I got to rerun the program from top to bottom to get here.
1:49
No, I'm just rerunning this little tiny bit. Imagine we're doing some huge calculation here and it takes 30 seconds to generate the data
1:56
but then, we want to graph it and slice it and work with it. We don't rerun that cell. We just do work with it and slice it afterwards.
2:05
So there's this really cool partial execution thing. Now,it does make it a little confusing and I'm like, well, what is the current state of the data?
2:12
What if I run this and then, this and then, this? Ah, it's crazy. So you can always go and just rerun all the cells, notice our thinking.
2:20
Boom, now we got the same data, okay? But this is super different than working with a Python script or an application code in general
2:29
and it's for the right use case one of the huge advantages of Notebooks. Alright, so that's how much data we have. Let's go and add another cell here.
2:40
Well, that's interesting. If it was markdown, here we go. We'll find another little variable here called entries and we want to go over to our
2:51
I'm going to rename this to feed. We got to rerun this to redefine feed. Here we go, so this is the dictionary
2:59
and we could get stuff like bracket like this by key but I like the get of this style 'cause it doesn't crash.
3:06
It just gives you none if it's not there. But let's see what happens when we run this. There is 152 entries. Let's see what the first entry is.
3:17
It looks like that. Apparently, it has a title and it has title detail and where it gets really interesting is down here
3:24
bum, bum, bum, bum, bum, bum, and summary. Here we go, so we have our description for each one of these and this is going to be, basically
3:37
the text it is on the page. And notice that it has links out to other things like here's an article I talked about on the podcast on TechRepublic.
3:46
So the goal of this exercise is to go through the entire 2.5 megabytes of text. extract all of these links, convert the links to domains
3:56
and then, rank the domains by how many times we link back to them. So that's our goal in this little exploration here.
4:03
And we could make this look a little bit less crazy by just getting the first 100 characters. Maybe 100 isn't enough. Let's say 300. There we go.
4:11
That looks like it's something were looking for, right? So once we have that working, we can go over here
4:17
and just comment this out or we can entirely remove it. I mean we would want to put something like download it or extracted some number of entries
4:25
the length of entries. Someday, who knows, maybe that going to be in the thousands so we'll put digit grouping so let's run this one more time.
4:33
You just hit Control + Enter. Come up here and you see the hotkeys and stuff, like run and so on. There we go, so we have a little summary like
4:44
hey, we got 152 entries. We could go look at the website and that would look correct. So we're well on our way to talking to the sever
4:52
getting the data, exploring it and I wasn't totally sure that it was description or summary I wanted. I can just kind of poke around at it here
4:59
and I can poke right without rerunning the expensive, quote, expensive. It takes a couple of seconds but imagine you're doing data science
5:06
and it's a huge computation. I could rerun this and explore it without having to rerun the whole program.
5:13
It's really different than straight apps, right?