python requests添加cookies模拟登陆爬取网页
从浏览器中拿到cookies添加到header中即可,cookies信息可以以直接在浏览器请求信息中拿到
- 无cookies,未登录时
#!coding:utf-8 import requests url = 'http://t.dianping.com/deal/22752400' header = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Host': 't.dianping.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36' } print(requests.get(url, headers=header).text)
无cookies未登录时访问结果如下:
- 有cookies,模拟登陆场景
#!coding:utf-8 import requests url = 'http://t.dianping.com/deal/22752400' header = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Cookie': 'cy=258; cye=guiyang; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; _lxsdk_cuid=1627487eabec8-0082feec299df4-3e3d5f01-100200-1627487eabfc8; _lxsdk=1627487eabec8-0082feec299df4-3e3d5f01-100200-1627487eabfc8; _hc.v=7fb40515-c2b8-59d3-2b47-427bcabb3554.1522373487; _dp.ac.v=f5832f3d-885a-440c-9a2d-4f0a221ea73e; dper=ce3cbad9cf126491bef9842a52d26dfd28d3e6b65494f66952c02618cef002b7; ll=7fd06e815b796be3df069dec7836c3df; ua=15329319971; JSESSIONID=791B2B83CA269DB065936DE85C79DEDA; _lxsdk_s=16275323b8a-80c-56c-04a%7C%7C2', 'Host': 't.dianping.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36' } print(requests.get(url, headers=header).text)
有cookies登陆之后可以得到网页信息,结果如下: