Consuming HTTP Services in Python Transcripts
Chapter: Screen scraping: Adding APIs where there are none
Lecture: Controlling your user agent in requests
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Here is a website we can use, called whatsmyuseragent.com
0:04
to actually check what it thinks and pretty much any other web server in the world
0:08
is going to think we are when we talk to it.
0:11
So if we pull that up in Firefox, let me do a request to it,
0:15
you can see it will say what is your user agent,
0:17
your user agent is something like this, Mozilla 5,
0:21
running on Macintosh, Intel running OS 10 Sierra and this version, 51 on Firefox,
0:26
and here, isn't this cool, I even have a public ipv6 address these days
0:31
I'm feeling so much like I'm in the future, okay so if we do a request,
0:34
against that url, it's going to redirect it here, and it turns out
0:38
that there is a special tag or id on that little section of where it says
0:44
what our user agent is, so they can style it, hm, I wonder if we could get that back.
0:48
So over here, what we're going to do is we're going to do a request,
0:51
and we're going to do a get, and we're going to use Beautiful Soup,
0:54
and this time instead of using lxml just to show you you could do the other one,
0:58
we'll use the built in parser, and then we want to go
1:02
grab that user agent and get its text, and print it out,
1:06
so we could do that, and our reported user agent is no surprise,
1:10
Python-request 213. In fact, I think we could do this
1:14
as a select one and drop this zero, get the same effect.
1:20
Perfect, that's more like it. Okay, so how do we control this,
1:23
well, it's this step where we control
1:26
what gets sent to the server, so we can do this with headers,
1:30
because the user agent is just a header, so this is going to be a dictionary,
1:33
and the value is going to be user- agent and then what we put in here is
1:38
whatever we want, you want it to think we're exactly what we were with Firefox,
1:41
fine, oh, there might be a small problem, if you don't pass it along.
1:46
Want to be like Firefox, boom we are, we could even have fun with them,
1:50
so you know, we're Mozilla 7, we're OS 10 32 and we're even Firefox 54,
1:57
just to show you like we can put whatever we want here, alright,
2:00
maybe they think some super secret version of Mac OS is like being prototype
2:06
down their site, who knows, but see,
2:08
we're running Firefox 54, never mind the newest one is 51,
2:11
we can tell it this and it gets sent right along, so there is a couple of uses for this,
2:15
like I said, it could be that you might want to specifically get the desktop site
2:19
or the mobile site and you can control whether you look mobile
2:23
or you look desktopy by setting your user agent,
2:26
you might also be getting blocked if you look like some kind of robot
2:30
so you can look non robot like by doing this, yeah, there is a couple of reasons,
2:33
you also might want to set it to be your own custom thing,
2:37
so we don't want to do this, I'll save this for you,
2:39
but maybe we want something else with user agent,
2:42
maybe we want to say I am super user agent 007 version 0.1, right,
2:50
maybe you want to pass information say this is actually this application
2:53
and it's this version that we're working with
2:56
so there might be a reason you want to pass that as well,
2:59
so we could be sure user agent 007 version .1, whatever you want, right,
3:02
there is a couple of reasons and the value you choose might depend on your thinking.
3:07
But, it can be important to control your user agent
3:10
because it determines the html that you get.