Consuming HTTP Services in Python Transcripts
Chapter: Screen scraping: Adding APIs where there are none
Lecture: Controlling your user agent in requests
Login or
purchase this course
to watch this video and the rest of the course contents.
Here is a website we can use, called to actually check what it thinks and pretty much any other web server in the world
is going to think we are when we talk to it. So if we pull that up in Firefox, let me do a request to it,
you can see it will say what is your user agent, your user agent is something like this, Mozilla 5,
running on Macintosh, Intel running OS 10 Sierra and this version, 51 on Firefox,
and here, isn't this cool, I even have a public ipv6 address these days I'm feeling so much like I'm in the future, okay so if we do a request,
against that url, it's going to redirect it here, and it turns out that there is a special tag or id on that little section of where it says
what our user agent is, so they can style it, hm, I wonder if we could get that back.
So over here, what we're going to do is we're going to do a request, and we're going to do a get, and we're going to use Beautiful Soup,
and this time instead of using lxml just to show you you could do the other one, we'll use the built in parser, and then we want to go
grab that user agent and get its text, and print it out, so we could do that, and our reported user agent is no surprise,
Python-request 213. In fact, I think we could do this as a select one and drop this zero, get the same effect.
Perfect, that's more like it. Okay, so how do we control this, well, it's this step where we control
what gets sent to the server, so we can do this with headers, because the user agent is just a header, so this is going to be a dictionary,
and the value is going to be user- agent and then what we put in here is
whatever we want, you want it to think we're exactly what we were with Firefox, fine, oh, there might be a small problem, if you don't pass it along.
Want to be like Firefox, boom we are, we could even have fun with them, so you know, we're Mozilla 7, we're OS 10 32 and we're even Firefox 54,
just to show you like we can put whatever we want here, alright, maybe they think some super secret version of macOS is like being prototype
down their site, who knows, but see, we're running Firefox 54, never mind the newest one is 51,
we can tell it this and it gets sent right along, so there is a couple of uses for this,
like I said, it could be that you might want to specifically get the desktop site or the mobile site and you can control whether you look mobile
or you look desktopy by setting your user agent, you might also be getting blocked if you look like some kind of robot
so you can look non robot like by doing this, yeah, there is a couple of reasons, you also might want to set it to be your own custom thing,
so we don't want to do this, I'll save this for you, but maybe we want something else with user agent,
maybe we want to say I am super user agent 007 version 0.1, right, maybe you want to pass information say this is actually this application
and it's this version that we're working with so there might be a reason you want to pass that as well,
so we could be sure user agent 007 version .1, whatever you want, right,
there is a couple of reasons and the value you choose might depend on your thinking. But, it can be important to control your user agent
because it determines the HTML that you get.