Python for the .NET Developer Transcripts
Chapter: Computational notebooks
Lecture: Getting all the RSS entries

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Alright, let's add a few more dependencies. Were going to need these later and we can go ahead and install those. Let's also go and add.
0:09 We'll put one of these up here and then, change this from code to markdown and I'll put some markdown code in here
0:16 Python Bytes, domain reference, like this. I'm going to say Python Bytes domain reference authority. If we run it, it turns into proper markdown.
0:26 We can put markdown below tech, code, images all sorts of stuff are going to help us tell the story.
0:33 Over here, we can add, I guess we can add one more. This is going to be the same markdown, but a subheading
0:40 and let's just say get the RSS speed data as a dictionary. 'Kay, so that's what we are going to do right here. It'll help us sort of talk about it.
0:50 We'll put some pictures and we'll go back and forth. Now, there's really a couple interesting things to do.
0:55 First of all, notice that we can run these over and over and once you run them, now see it's got a star and then, it's got an output.
1:02 If we have another one over here I could do something like I want to print out the size of downloaded data. Here we go, we'll do digit grouping
1:16 and like this maybe if we divided by 1,024 1,024 and put megabytes. We'll see how much data it turns out that this is.
1:27 We downloaded, actually let's put. So there's something really interesting happened here. Did you see that this took a while to run
1:33 and that it has a five and this has a seven? And I can put exclamation mark here 'cause we are so excited and watch how quick it runs.
1:40 Instantly. These cells run independently. They're not like a separate program I got to rerun the program from top to bottom to get here.
1:49 No, I'm just rerunning this little tiny bit. Imagine we're doing some huge calculation here and it takes 30 seconds to generate the data
1:56 but then, we want to graph it and slice it and work with it. We don't rerun that cell. We just do work with it and slice it afterwards.
2:05 So there's this really cool partial execution thing. Now,it does make it a little confusing and I'm like, well, what is the current state of the data?
2:12 What if I run this and then, this and then, this? Ah, it's crazy. So you can always go and just rerun all the cells, notice our thinking.
2:20 Boom, now we got the same data, okay? But this is super different than working with a Python script or an application code in general
2:29 and it's for the right use case one of the huge advantages of Notebooks. Alright, so that's how much data we have. Let's go and add another cell here.
2:40 Well, that's interesting. If it was markdown, here we go. We'll find another little variable here called entries and we want to go over to our
2:51 I'm going to rename this to feed. We got to rerun this to redefine feed. Here we go, so this is the dictionary
2:59 and we could get stuff like bracket like this by key but I like the get of this style 'cause it doesn't crash.
3:06 It just gives you none if it's not there. But let's see what happens when we run this. There is 152 entries. Let's see what the first entry is.
3:17 It looks like that. Apparently, it has a title and it has title detail and where it gets really interesting is down here
3:24 bum, bum, bum, bum, bum, bum, and summary. Here we go, so we have our description for each one of these and this is going to be, basically
3:37 the text it is on the page. And notice that it has links out to other things like here's an article I talked about on the podcast on TechRepublic.
3:46 So the goal of this exercise is to go through the entire 2.5 megabytes of text. extract all of these links, convert the links to domains
3:56 and then, rank the domains by how many times we link back to them. So that's our goal in this little exploration here.
4:03 And we could make this look a little bit less crazy by just getting the first 100 characters. Maybe 100 isn't enough. Let's say 300. There we go.
4:11 That looks like it's something were looking for, right? So once we have that working, we can go over here
4:17 and just comment this out or we can entirely remove it. I mean we would want to put something like download it or extracted some number of entries
4:25 the length of entries. Someday, who knows, maybe that going to be in the thousands so we'll put digit grouping so let's run this one more time.
4:33 You just hit Control + Enter. Come up here and you see the hotkeys and stuff, like run and so on. There we go, so we have a little summary like
4:44 hey, we got 152 entries. We could go look at the website and that would look correct. So we're well on our way to talking to the sever
4:52 getting the data, exploring it and I wasn't totally sure that it was description or summary I wanted. I can just kind of poke around at it here
4:59 and I can poke right without rerunning the expensive, quote, expensive. It takes a couple of seconds but imagine you're doing data science
5:06 and it's a huge computation. I could rerun this and explore it without having to rerun the whole program.
5:13 It's really different than straight apps, right?


Talk Python's Mastodon Michael Kennedy's Mastodon