phpkeepalivePython第三方库requests详解

Requests 是用Python措辞编写，基于 urllib，采取 Apache2 Licensed 开源协议的 HTTP 库。
它比 urllib 更加方便，可以节约我们大量的事情，完备知足 HTTP 测试需求。
Requests 的哲学因此 PEP 20 的习语为中央开拓的，以是它比 urllib 更加 Pythoner。
更主要的一点是它支持 Python3 哦！

希望我的博客对您有用。

一、安装 Requests

phpkeepalivePython第三方库requests详解 PHP

通过pip安装

pip install requests

或者，下载代码后安装：

$ git clone git://github.com/kennethreitz/requests.git$ cd requests$ python setup.py install

再

二、发送要求与通报参数

先来一个大略的例子吧！
让你理解下其威力：

import requests r = requests.get(url='http://www.itwhy.org') # 最基本的GET要求print(r.status_code) # 获取返回状态r = requests.get(url='http://dict.baidu.com/s', params={'wd':'python'}) #带参数的GET要求print(r.url)print(r.text) #打印解码后的返回数据

很大略吧！
不但GET方法大略，其他方法都是统一的接口样式哦！

requests.get(‘https://github.com/timeline.json’) #GET要求 requests.post(“http://httpbin.org/post”) #POST要求 requests.put(“http://httpbin.org/put”) #PUT要求 requests.delete(“http://httpbin.org/delete”) #DELETE要求 requests.head(“http://httpbin.org/get”) #HEAD要求 requests.options(“http://httpbin.org/get”) #OPTIONS要求

PS：以上的HTTP方法，对付WEB系统一样平常只支持 GET 和 POST，有一些还支持 HEAD 方法。
带参数的要求实例：

import requestsrequests.get('http://www.dict.baidu.com/s', params={'wd': 'python'}) #GET参数实例requests.post('http://www.itwhy.org/wp-comments-post.php', data={'comment': '测试POST'}) #POST参数实例

POST发送JSON数据：

import requestsimport json r = requests.post('https://api.github.com/some/endpoint', data=json.dumps({'some': 'data'}))print(r.json())

定制header：

import requestsimport json data = {'some': 'data'}headers = {'content-type': 'application/json', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'} r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)print(r.text)

三、Response工具

利用requests方法后，会返回一个response工具，其存储了做事器相应的内容，如上实例中已经提到的 r.text、r.status_code…… 获取文本办法的相应体实例：当你访问 r.text 之时，会利用其相应的文本编码进行解码，并且你可以修正其编码让 r.text 利用自定义的编码进行解码。

r = requests.get('http://www.itwhy.org')print(r.text, '\n{}\n'.format(''79), r.encoding)r.encoding = 'GBK'print(r.text, '\n{}\n'.format(''79), r.encoding)

其他相应：

r.status_code #相应状态码 r.raw #返回原始相应体，也便是 urllib 的 response 工具，利用 r.raw.read() 读取 r.content #字节办法的相应体，会自动为你解码 gzip 和 deflate 压缩 r.text #字符串办法的相应体，会自动根据相应头部的字符编码进行解码 r.headers #以字典工具存储做事器相应头，但是这个字典比较分外，字典键不区分大小写，若键不存在则返回None #分外方法# r.json() #Requests中内置的JSON解码器 r.raise_for_status() #失落败要求(非200相应)抛出非常

案例之一：

import requests URL = 'http://ip.taobao.com/service/getIpInfo.php' # 淘宝IP地址库APItry: r = requests.get(URL, params={'ip': '8.8.8.8'}, timeout=1) r.raise_for_status() # 如果相应状态码不是 200，就主动抛出非常except requests.RequestException as e: print(e)else: result = r.json() print(type(result), result, sep='\n')

四、上传文件

利用 Requests 模块，上传文件也是如此大略的，文件的类型会自动进行处理：

import requests url = 'http://127.0.0.1:5000/upload'files = {'file': open('/home/lyb/sjzl.mpg', 'rb')}#files = {'file': ('report.jpg', open('/home/lyb/sjzl.mpg', 'rb'))} #显式的设置文件名 r = requests.post(url, files=files)print(r.text)

更加方便的是，你可以把字符串当着文件进行上传：

import requests url = 'http://127.0.0.1:5000/upload'files = {'file': ('test.txt', b'Hello Requests.')} #必需显式的设置文件名 r = requests.post(url, files=files)print(r.text)

五、身份验证

基本身份认证(HTTP Basic Auth):

import requestsfrom requests.auth import HTTPBasicAuth r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=HTTPBasicAuth('user', 'passwd'))# r = requests.get('https://httpbin.org/hidden-basic-auth/user/passwd', auth=('user', 'passwd')) # 简写print(r.json())

另一种非常盛行的HTTP身份认证形式是择要式身份认证，Requests对它的支持也是开箱即可用的:

requests.get(URL, auth=HTTPDigestAuth('user', 'pass'))

六、Cookies与会话工具

如果某个相应中包含一些Cookie，你可以快速访问它们：

import requests r = requests.get('http://www.google.com.hk/')print(r.cookies['NID'])print(tuple(r.cookies))

要想发送你的cookies到做事器，可以利用 cookies 参数：

import requests url = 'http://httpbin.org/cookies'cookies = {'testCookies_1': 'Hello_Python3', 'testCookies_2': 'Hello_Requests'}# 在Cookie Version 0中规定空格、方括号、圆括号、即是号、逗号、双引号、斜杠、问号、@，冒号，分号等分外符号都不能作为Cookie的内容。 r = requests.get(url, cookies=cookies)print(r.json())

会话工具让你能够跨要求保持某些参数，最方便的是在同一个Session实例发出的所有要求之间保持cookies，且这些都是自动处理的，甚是方便。
下面就来一个真正的实例，如下是快盘签到脚本：

import requests headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8', 'Accept-Encoding': 'gzip, deflate, compress', 'Accept-Language': 'en-us;q=0.5,en;q=0.3', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'} s = requests.Session()s.headers.update(headers)# s.auth = ('superuser', '123')s.get('https://www.kuaipan.cn/account_login.htm') _URL = 'http://www.kuaipan.cn/index.php's.post(_URL, params={'ac':'account', 'op':'login'}, data={'username':'@foxmail.com', 'userpwd':'', 'isajax':'yes'})r = s.get(_URL, params={'ac':'zone', 'op':'taskdetail'})print(r.json())s.get(_URL, params={'ac':'common', 'op':'usersign'})

七、超时与非常

timeout 仅对连接过程有效，与相应体的下载无关。

>>> requests.get('http://github.com', timeout=0.001)Traceback (most recent call last): File "<stdin>", line 1, in <module>requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

所有Requests显式抛出的非常都继续自 requests.exceptions.RequestException：ConnectionError、HTTPError、Timeout、TooManyRedirects。

转自:http://www.itwhy.org/%E8%BD%AF%E4%BB%B6%E5%B7%A5%E7%A8%8B/python/python-%E7%AC%AC%E4%B8%89%E6%96%B9-http-%E5%BA%93-requests-%E5%AD%A6%E4%B9%A0.html

requests是python的一个HTTP客户端库，跟urllib，urllib2类似，那为什么要用requests而不用urllib2呢？官方文档中是这样解释的：

python的标准库urllib2供应了大部分须要的HTTP功能，但是API太逆天了，一个大略的功能就须要一大堆代码。

我也看了下requests的文档，确实很大略，适宜我这种

插播个好！
刚看到requests有了中文翻译版，建议英文不好的看看，内容也比我的博客好多了，详细链接是：http://cn.python-requests.org/en/latest/(不过是v1.1.0版，另抱歉，之前贴错链接了)。

1. 安装

安装很大略，我是win系统，就在这里下载了安装包（网页中download the zipball处链接），然后$ python setup.py install就装好了。
当然，有easy_install或pip的朋友可以直策应用：easy_install requests或者pip install requests来安装。
至于linux用户，这个页面还有其他安装方法。
测试：在IDLE中输入import requests，如果没提示缺点，那解释已经安装成功了！

2. 小试牛刀

>>>import requests>>> r = requests.get('http://www.zhidaow.com') # 发送要求>>> r.status_code # 返回码 200>>> r.headers['content-type'] # 返转头部信息'text/html; charset=utf8'>>> r.encoding # 编码信息'utf-8'>>> r.text #内容部分（PS，由于编码问题，建议这里利用r.content）u'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'...

是不是很大略？比urllib2和urllib大略直不雅观的多？！
那请接着看快速指南吧。

3. 快速指南3.1 发送要求

发送要求很大略的，首先要导入requests模块：

>>>import requests

接下来让我们获取一个网页，例如我个人博客的首页：

>>>r = requests.get('http://www.zhidaow.com')

接下来，我们就可以利用这个r的各种方法和函数了。
其余，HTTP要求还有很多类型，比如POST,PUT,DELETE,HEAD,OPTIONS。
也都可以用同样的办法实现：

>>> r = requests.post("http://httpbin.org/post")>>> r = requests.put("http://httpbin.org/put")>>> r = requests.delete("http://httpbin.org/delete")>>> r = requests.head("http://httpbin.org/get")>>> r = requests.options("http://httpbin.org/get")

由于目前我还没用到这些，以是没有深入研究。

3.2 在URLs中通报参数

有时候我们须要在URL中通报参数，比如在采集百度搜索结果时，我们wd参数（搜索词）和rn参数（搜素结果数量），你可以手工组成URL，requests也供应了一种看起来很NB的方法：

>>> payload = {'wd': '张亚楠', 'rn': '100'}>>> r = requests.get("http://www.baidu.com/s", params=payload)>>> print r.urlu'http://www.baidu.com/s?rn=100&wd=%E5%BC%A0%E4%BA%9A%E6%A5%A0'

上面wd=的乱码便是“张亚楠”的转码形式。
（彷佛参数按照首字母进行了排序。
）

3.3 获取相应内容

可以通过r.text来获取网页的内容。

>>> r = requests.get('https://www.zhidaow.com')>>> r.textu'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

文档里说，requests会自动将内容转码。
大多数unicode字体都会无缝转码。
但我在cygwin下利用时总是涌现UnicodeEncodeError缺点，忧郁。
倒是在python的IDLE中完备正常。
其余，还可以通过r.content来获取页面内容。

>>> r = requests.get('https://www.zhidaow.com')>>> r.contentb'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"...'

文档中说r.content因此字节的办法去显示，以是在IDLE中以b开头。
但我在cygwin中用起来并没有，下载网页恰好。
以是就替代了urllib2的urllib2.urlopen(url).read()功能。
（基本上是我用的最多的一个功能。
）

3.4 获取网页编码

可以利用r.encoding来获取网页编码。

>>> r = requests.get('http://www.zhidaow.com')>>> r.encoding'utf-8'

当你发送要求时，requests会根据HTTP头部来预测网页编码，当你利用r.text时，requests就会利用这个编码。
当然你还可以修正requests的编码形式。

>>> r = requests.get('http://www.zhidaow.com')>>> r.encoding'utf-8'>>>r.encoding = 'ISO-8859-1'

像上面的例子，对encoding修正后就直接会用修正后的编码去获取网页内容。

3.5 json

像urllib和urllib2，如果用到json，就要引入新模块，如json和simplejson，但在requests中已经有了内置的函数，r.json()。
就拿查询IP的API来说：

>>>r = requests.get('http://ip.taobao.com/service/getIpInfo.php?ip=122.88.60.28')>>>r.json()['data']['country']'中国'3.6 网页状态码

我们可以用r.status_code来检讨网页的状态码。

>>>r = requests.get('http://www.mengtiankong.com')>>>r.status_code200>>>r = requests.get('http://www.mengtiankong.com/123123/')>>>r.status_code404>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')>>>r.urlu'http://www.zhidaow.com/>>>r.status_code200

前两个例子很正常，能正常打开的返回200，不能正常打开的返回404。
但第三个就有点奇怪了，那个是百度搜索结果中的302跳转地址，但状态码显示是200，接下来我用了一招让他原形毕露：

>>>r.history(<Response [302]>,)

这里能看出他是利用了302跳转。
大概有人认为这样可以通过判断和正则来获取跳转的状态码了，实在还有个更大略的方法：

>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)>>>r.status_code302

只要加上一个参数allow_redirects，禁止了跳转，就直接涌现跳转的状态码了，好用吧？我也利用这个在末了一掌做了个大略的获取网页状态码的小运用，事理便是这个。

3.7 相应头内容

可以通过r.headers来获取相应头内容。

>>>r = requests.get('http://www.zhidaow.com')>>> r.headers{ 'content-encoding': 'gzip', 'transfer-encoding': 'chunked', 'content-type': 'text/html; charset=utf-8'; ...}

可以看到因此字典的形式返回了全部内容，我们也可以访问部分内容。

>>> r.headers['Content-Type']'text/html; charset=utf-8' >>> r.headers.get('content-type')'text/html; charset=utf-8'3.8 设置超时时间

我们可以通过timeout属性设置超时时间，一旦超过这个韶光还没得到相应内容，就会提示缺点。

采集时为避免被封IP，常常会利用代理。
requests也有相应的proxies属性。

import requests proxies = { "http": "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080",} requests.get("http://www.zhidaow.com", proxies=proxies)

如果代理须要账户和密码，则需这样：

proxies = { "http": "http://user:pass@10.10.1.10:3128/",}3.10 要求头内容

要求头内容可以用r.request.headers来获取。

>>> r.request.headers{'Accept-Encoding': 'identity, deflate, compress, gzip','Accept': '/', 'User-Agent': 'python-requests/1.2.3 CPython/2.7.3 Windows/XP'}3.11 自定义要求头部

伪装要求头部是采集时常常用的，我们可以用这个方法来隐蔽：

r = requests.get('http://www.zhidaow.com')print r.request.headers['User-Agent']#python-requests/1.2.3 CPython/2.7.3 Windows/XP headers = {'User-Agent': 'alexkh'}r = requests.get('http://www.zhidaow.com', headers = headers)print r.request.headers['User-Agent']#alexkh3.12 持久连接keep-alive

requests的keep-alive是基于urllib3，同一会话内的持久连接完备是自动的。
同一会话内的所有要求都会自动利用恰当的连接。

也便是说，你无需任何设置，requests会自动实现keep-alive。

4. 大略运用4.1 获取网页返回码

def get_status(url): r = requests.get(url, allow_redirects = False) return r.status_code print get_status('http://www.zhidaow.com') #200print get_status('http://www.zhidaow.com/hi404/')#404print get_status('http://mengtiankong.com')#301print get_status('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')#302print get_status('http://www.huiya56.com/com8.intre.asp?46981.html')#500后记

1、官方文档 requests的详细安装过程请看：http://docs.python-requests.org/en/latest/user/install.html#install requests的官方指南文档：http://docs.python-requests.org/en/latest/user/quickstart.html requests的高等指南文档：http://docs.python-requests.org/en/latest/user/advanced.html#advanced 2、本文内容部分翻译自官方文档，部分自己归纳。
3、大多数用的IDLE格式，累去世了，下次直接用编辑器格式，这样更符合我的习气。
4、还是那句话，有问题留言或email。
5、图注：requests官方文档上的一只老鳖。

这是迁移博客的第一篇文章，之以是选择这个作为第一篇，是由于这篇文章在我原来的博客里面是阅读量最大的一篇了，不好意思的是，这还是一个转载。