python 爬虫

超级简单，首先先学会python的文件操作，因为要写入到文件中去。python把一切都封装好了，而且非常简洁，也简单。

基础

1、导入requests模块

import requests

2、设置url

url = 'http://www.baidu.com'

3、发送请求

result = requests.get(url)

4、获取内容

# 获取二进制内容
content = result.content

5、获取解码的内容

content = result.content.decode()

6、查看请求头

header = result.request.headers
print(header)

User-Agent

User Agent中文名为用户代理，简称 UA，它是一个特殊字符串头，使得服务器能够识别客户使用的操作系统及版本、CPU 类型、浏览器及版本、浏览器渲染引擎、浏览器语言、浏览器插件等。
每个浏览器都会有一个User-Agent，来标识自己的什么浏览器，以及版本等一些信息。服务器在接收请求的时候，会判断这个User-Agent，如果不正常，只返回部分数据。

查看浏览器的UA

随便打开一个浏览器，我这里是Chrome。

小例子，下载百度首页

"""
谷歌代理：
    Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 S
"""

import requests

url = 'http://www.baidu.com'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 S'}
response = requests.get(url, headers=header)
ret = response.content.decode()

with open('1.html', mode='w', encoding='utf-8') as req:
    req.write(ret)

然后打开1.html
python 爬虫

基础

User-Agent

查看浏览器的UA

小例子，下载百度首页

相关推荐