Python for the .NET developer Transcripts
Chapter: Computational notebooks
Lecture: Getting all the RSS entries
0:00 Alright, let's add a few more dependencies.
0:02 Were going to need these later
0:03 and we can go ahead and install those.
0:05 Let's also go and add.
0:08 We'll put one of these up here
0:10 and then, change this from code to markdown
0:12 and I'll put some markdown code in here
0:15 Python Bytes, domain reference, like this.
0:21 I'm going to say Python Bytes domain reference authority.
0:23 If we run it, it turns into proper markdown.
0:25 We can put markdown below tech, code, images
0:29 all sorts of stuff are going to help us tell the story.
0:32 Over here, we can add, I guess we can add one more.
0:35 This is going to be the same markdown, but a subheading
0:39 and let's just say get the RSS speed data as a dictionary.
0:46 'Kay, so that's what we are going to do right here.
0:48 It'll help us sort of talk about it.
0:49 We'll put some pictures and we'll go back and forth.
0:52 Now, there's really a couple interesting things to do.
0:54 First of all, notice that we can run these over and over
0:58 and once you run them, now see it's got a star
1:00 and then, it's got an output.
1:01 If we have another one over here
1:02 I could do something like I want to print out
1:05 the size of downloaded data.
1:13 Here we go, we'll do digit grouping
1:15 and like this maybe if we divided by 1,024
1:20 1,024 and put megabytes.
1:23 We'll see how much data it turns out that this is.
1:26 We downloaded, actually let's put.
1:28 So there's something really interesting happened here.
1:30 Did you see that this took a while to run
1:32 and that it has a five and this has a seven?
1:34 And I can put exclamation mark here
1:37 'cause we are so excited and watch how quick it runs.
1:39 Instantly. These cells run independently.
1:43 They're not like a separate program
1:45 I got to rerun the program from top to bottom to get here.
1:48 No, I'm just rerunning this little tiny bit.
1:51 Imagine we're doing some huge calculation here
1:53 and it takes 30 seconds to generate the data
1:55 but then, we want to graph it and slice it and work with it.
1:58 We don't rerun that cell.
2:01 We just do work with it and slice it afterwards.
2:04 So there's this really cool partial execution thing.
2:07 Now,it does make it a little confusing
2:09 and I'm like, well, what is the current state of the data?
2:11 What if I run this and then, this and then, this?
2:14 Ah, it's crazy. So you can always go
2:17 and just rerun all the cells, notice our thinking.
2:19 Boom, now we got the same data, okay?
2:22 But this is super different than working with
2:24 a Python script or an application code in general
2:28 and it's for the right use case
2:30 one of the huge advantages of Notebooks.
2:33 Alright, so that's how much data we have.
2:35 Let's go and add another cell here.
2:39 Well, that's interesting.
2:40 If it was markdown, here we go.
2:44 We'll find another little variable here called entries
2:47 and we want to go over to our
2:50 I'm going to rename this to feed.
2:53 We got to rerun this to redefine feed.
2:56 Here we go, so this is the dictionary
2:58 and we could get stuff like bracket like this by key
3:02 but I like the get of this style 'cause it doesn't crash.
3:05 It just gives you none if it's not there.
3:08 But let's see what happens when we run this.
3:09 There is 152 entries.
3:11 Let's see what the first entry is.
3:16 It looks like that.
3:17 Apparently, it has a title and it has title detail
3:20 and where it gets really interesting is down here
3:23 bum, bum, bum, bum, bum, bum, and summary.
3:29 Here we go, so we have our description for each one of these
3:33 and this is going to be, basically
3:36 the text it is on the page.
3:38 And notice that it has links out to other things
3:41 like here's an article I talked about
3:42 on the podcast on TechRepublic.
3:45 So the goal of this exercise
3:47 is to go through the entire 2.5 megabytes of text.
3:51 extract all of these links, convert the links to domains
3:55 and then, rank the domains by how many times
3:57 we link back to them.
3:59 So that's our goal in this little exploration here.
4:02 And we could make this look a little bit less crazy
4:04 by just getting the first 100 characters.
4:06 Maybe 100 isn't enough. Let's say 300. There we go.
4:10 That looks like it's something were looking for, right?
4:14 So once we have that working, we can go over here
4:16 and just comment this out or we can entirely remove it.
4:19 I mean we would want to put something like
4:21 download it or extracted some number of entries
4:24 the length of entries.
4:26 Someday, who knows, maybe that going to be in the thousands
4:29 so we'll put digit grouping
4:31 so let's run this one more time.
4:32 You just hit Control + Enter.
4:35 Come up here and you see the hotkeys and stuff, like run and so on.
4:41 There we go, so we have a little summary like
4:43 hey, we got 152 entries.
4:44 We could go look at the website and that would look correct.
4:47 So we're well on our way to talking to the sever
4:51 getting the data, exploring it
4:53 and I wasn't totally sure that it was description
4:55 or summary I wanted.
4:56 I can just kind of poke around at it here
4:58 and I can poke right without rerunning
5:00 the expensive, quote, expensive.
5:02 It takes a couple of seconds
5:03 but imagine you're doing data science
5:05 and it's a huge computation.
5:07 I could rerun this
5:08 and explore it without having to rerun the whole program.
5:12 It's really different than straight apps, right?