python请求从浏览器或urllib返回不同的网页

问题描述：

我使用请求来为某些内容刮取网页。
当我使用python请求从浏览器或urllib返回不同的网页

import requests 
requests.get('example.org')

我从一个不同的网页，我得到当我用我broswer或使用

import urllib.request 
urllib.request.urlopen('example.org')

我用urllib尝试，但它真的很慢。
在比较测试中，我做的比requests慢50％！

你怎么解决这个问题？

答

经过大量的调查后，我发现该网站只传递附加到该网站的第一位访问者的标题中的cookie。

所以解决的办法是用head请求来获取饼干，然后用你的get要求重新发送它们

import requests 
# get the cookies with head(), this doesn't get the body so it's FAST 
cookies = requests.head('example.com') 
# send get request with the cookies 
result = requests.get('example.com', cookies=cookies)

现在比的urllib +相同的结果:)

或者你也可以使用更快的了'会话“实例。它会自动使用CookieJar管理Cookie。 – Dashadower

我尝试过，但在我的情况下，cookie只与第一个请求一起发送，我不想在后续请求中重复使用同一个cookie，因此我只是将cookie传递给get请求。您的建议仍然适用于大多数其他情况 –

python请求从浏览器或urllib返回不同的网页

相关推荐