使用BeautifulSoup网站刮擦：TypeError：'NoneType'对象无法调用

问题描述：

我是绝对的初学者。我尝试使用BeautifulSoup并刮掉一个网站。我确实得到了HTML，但是我现在想要获得所有类别为content_class的divs。使用BeautifulSoup网站刮擦：TypeError：'NoneType'对象无法调用

这里是我的尝试：

import requests 
from BeautifulSoup import BeautifulSoup 

#Request the page and parse the HTML 
url = 'mywebsite' 
response = requests.get(url) 
html = response.content 

#Beautiful Soup 
soup = BeautifulSoup(html) 
soup.find_all('div', class_="content_class")

然而，这并不工作。我得到：

Traceback (most recent call last): File "scrape.py", line 11, in soup.find_all('div', class_="content_class") TypeError: 'NoneType' object is not callable

我在做什么错？

如果你在倒数第二行放上'print（soup.find_all）'，打印什么？ – unutbu

所以我做了'soup = BeautifulSoup（html）'，然后 'print（soup.find_all）'，打印的是'None'。 –

答

您使用BeautifulSoup version three，但似乎以下BeautifulSoup version four的文档。 Element.find_all() method仅适用于最新的主要版本（称为Element.findAll() in version 3）。

我强烈建议你升级：

pip install beautifulsoup4

和

from bs4 import BeautifulSoup

3版已停止在2012年接收更新;它现在严重过时了。

谢谢，我做到了！但是，现在我得到'导致此警告的代码位于文件scrape.py的第10行。为了摆脱这一警告的，变化的代码看起来像这样： BeautifulSoup（YOUR_MARKUP}）这样： BeautifulSoup（YOUR_MARKUP， “html.parser”） MARKUP_TYPE = MARKUP_TYPE））' –

@GeorgeWelder，只需按照警告中的说明进行操作。你也可以简单地忽略它。 – ForceBru

@GeorgeWelder：是的，BeautifulSoup 4用于为你自动选择一个分析后端，但是当你稍后安装LXML时会导致意想不到的变化。您现在被要求作出明确的选择：'汤= BeautifulSoup（html，'html.parser'）'或'soup = BeautifulSoup（html，'lxml'）'或'soup = BeautifulSoup（html，'html5lib'）' 。 –

答

你得到这个错误，因为在BeautifulSoup没有方法“find_all”，有“的findAll”的方法，此代码应帮助

soup.findAll('div', {'class': 'content_class'})

谢谢。我试过了，错误消失了，但是我得到了一个空数组：'[]'，但我确定'content_class'类存在于多个div中的文档中。 –

你真的不应该再使用BeautifulSoup版本3了。它已经维持了5年多。 –

使用BeautifulSoup网站刮擦：TypeError：'NoneType'对象无法调用

相关推荐