Consuming HTTP Services in Python Transcripts
Chapter: Screen scraping: Adding APIs where there are none
Lecture: Controlling your user agent in requests
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Here is a website we can use, called whatsmyuseragent.com to actually check what it thinks and pretty much any other web server in the world
0:09
is going to think we are when we talk to it. So if we pull that up in Firefox, let me do a request to it,
0:16
you can see it will say what is your user agent, your user agent is something like this, Mozilla 5,
0:22
running on Macintosh, Intel running OS 10 Sierra and this version, 51 on Firefox,
0:27
and here, isn't this cool, I even have a public ipv6 address these days I'm feeling so much like I'm in the future, okay so if we do a request,
0:35
against that url, it's going to redirect it here, and it turns out that there is a special tag or id on that little section of where it says
0:45
what our user agent is, so they can style it, hm, I wonder if we could get that back.
0:49
So over here, what we're going to do is we're going to do a request, and we're going to do a get, and we're going to use Beautiful Soup,
0:55
and this time instead of using lxml just to show you you could do the other one, we'll use the built in parser, and then we want to go
1:03
grab that user agent and get its text, and print it out, so we could do that, and our reported user agent is no surprise,
1:11
Python-request 213. In fact, I think we could do this as a select one and drop this zero, get the same effect.
1:21
Perfect, that's more like it. Okay, so how do we control this, well, it's this step where we control
1:27
what gets sent to the server, so we can do this with headers, because the user agent is just a header, so this is going to be a dictionary,
1:34
and the value is going to be user- agent and then what we put in here is
1:39
whatever we want, you want it to think we're exactly what we were with Firefox, fine, oh, there might be a small problem, if you don't pass it along.
1:47
Want to be like Firefox, boom we are, we could even have fun with them, so you know, we're Mozilla 7, we're OS 10 32 and we're even Firefox 54,
1:58
just to show you like we can put whatever we want here, alright, maybe they think some super secret version of macOS is like being prototype
2:07
down their site, who knows, but see, we're running Firefox 54, never mind the newest one is 51,
2:12
we can tell it this and it gets sent right along, so there is a couple of uses for this,
2:16
like I said, it could be that you might want to specifically get the desktop site or the mobile site and you can control whether you look mobile
2:24
or you look desktopy by setting your user agent, you might also be getting blocked if you look like some kind of robot
2:31
so you can look non robot like by doing this, yeah, there is a couple of reasons, you also might want to set it to be your own custom thing,
2:38
so we don't want to do this, I'll save this for you, but maybe we want something else with user agent,
2:43
maybe we want to say I am super user agent 007 version 0.1, right, maybe you want to pass information say this is actually this application
2:54
and it's this version that we're working with so there might be a reason you want to pass that as well,
3:00
so we could be sure user agent 007 version .1, whatever you want, right,
3:03
there is a couple of reasons and the value you choose might depend on your thinking. But, it can be important to control your user agent
3:11
because it determines the HTML that you get.