python爬取网站数据,如何绕过反爬虫策略
1、使用session对象
session = requests.session() strhtml = session.get(url) #与当前网站的首次会话
2、设置headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/69.0.3497.100 Safari/537.36", "Accept": "application/json"} session.headers = headers
3、设置cookies
设置与网站首次会话时的cookies为默认的cookies
cookies = session.cookies session.headers.setdefault('cookies', cookies)
此后,可以复用之前会话中的cookies了
strhtml2 = session.get(url2)