爬虫练习day1 request+bs4 爬取网络动画图片
Import requests
url = “”
response = response.get(url)
print(response) response [200] 则请求成功
若出现response418 错误代码
是因为触发了反爬
添加:
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'} response = requests.get(url, headers=headers)
#指定解码用的编码集
Response.encoding = ‘utf-8’
#把源代码交给beautifulsoup
main_page = BeautifulSoup(response.text, “html.parser”) #“html.parser”为固定写法
beautifulsoup:
find(标签, attrs={“属性”:”值”}) 找一个
find_all(标签, attrs={“属性”:”值”}) 找全部
#find(“张三”, attrs = {“身高”:”180”})
<div class=”a”>a</div>
<div class=”b”>b</div>
Find(“div”,attrs={“class”:”b”})
执行
f = open("%s.jpg" % title, mode='wb') f.write(requests.get(img.get("src")).content)
出现OSError: [Errno 22] Invalid argument: '\n萤火之森动漫图片 萤火之森卡通图片\n.jpg'
解决方法:
title = title.replace('\n','')
成功后: