#100DaysOfWeb in Python Transcripts
Chapter: Days 73-76: Web Scraping
Lecture: Your next 4 days
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's have a quick look at what we'll be learning over the next four days on web scraping. The first two days will be spent on Beautiful Soup 4
0:09
and we'll be looking at a simplistic way of scraping the Talk Python course listing page. This is to get you into web scraping.
0:17
This is a very important aspect of Python coding we all end up having to do it. So just enjoy the first day, doing the videos three
0:25
to five and then following along with the content. At the end of day one you can try your hand at scraping additional data from the Talk Python page.
0:37
Don't go too far into it, just have a play try and keep it relaxed, have a bit of fun. Now day two, this is where I want you to dive deep
0:47
except what I want you to do is go and find your own websites to scrape, so not anything that we cover in this
0:52
I want you to go and look at something like steam powered so for steam games, look up any used websites you can think of
1:01
look up any blogs, you can look at Pybites, you can look at GitHub, you can look at Talk Python, anything else
1:07
that isn't the exact page that we've been looking at. Go ahead and scrape them and try and apply what you've
1:13
learnt in day one and apply it to those websites to pull down things like the title, pull down content based on tags and so on and so forth.
1:22
That should keep you covered for the first two days. Now on days three and four we are going to look at Newspaper3K. I won't go into too much detail on
1:32
what that is but it's essentially a web scraping tool that allows you to specify a newspaper article online like a news article and scrape from that.
1:43
Go through the videos on your third day so the first day of Newspaper3K, day three of this chapter, go through
1:50
the video six to eight to finish off the course and once you've finished those videos just have a play
1:58
again, practice on other news articles, practice on the same articles that we looked at, just find your rhythm of learning this tool.
2:09
Day four, this is something really complex and I'm really looking forward to seeing what people come up with.
2:15
At this point in the 100 days of web course you'll have had exposure to a lot of Flask so I would like you
2:23
to go to this newspaper demo Heroku app that you see there on your screen now. We demonstrate that in day three, go and have a look at
2:31
that and I want you to try and reproduce that page using the Flask skills that you've already learned in this course.
2:39
So this is something that's really important with web scraping, it's easy to scrape data, it's something
2:45
really easy to, you can learn that very quickly. It's what you do with that data that makes web scraping special and really takes it into its own.
2:54
So I want you to take Newspaper3K and I want you to scrape articles, that's fine, but I want you to put that
3:03
functionality into a Flask app just like that Heroku app demo that you see there. So that's going to need a few extra little bits of effort
3:13
looking at the tips there you're just using the parsing from Newspaper3K, the authors, the published date, the text
3:19
the image and you're just getting those to be presented on a webpage. So how you do that is up to you.
3:26
You can have it all work on one page or you can have it redirect to a new page every time it scrapes.
3:33
Either way don't get distracted by the HTML and the CSS. It doesn't have to look pretty, it can look like a dogs'
3:40
breakfast that's fine, all we're looking for here is the functionality of the Newspaper3K mixed with some Flask to get it as a usable app.
3:50
So that's it, that's days one to four of Newspaper3K and Beautiful Soup 4 for web scraping. Enjoy.