Beautifulsoup =提取内容中的内容

问题描述：

我想提取内容“Hello world”。请注意，页面上也有多个<table>和<td colspan="2">。Beautifulsoup =提取内容中的内容

我尝试了以下内容：

hello = soup.find(text='Name: ') 
hello.findPreviousSiblings

但它返回任何内容。

下面的代码的片段：

<table border="0" cellspacing="2" width="800"> 
<tr> 
<td colspan="2"><b>Name: </b>Hello world</td> 
</tr> 
<tr>

此外，我也有问题，以下提取“我的家庭地址”：

<td><b>Address:</b></td> 

<td>My home address</td>

我还使用搜索text =“Address：”的方法相同，但是如何导航到下一行并提取<td>的内容？

答

下次使用，而不是

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>' 
>>> soup = BeautifulSoup(s) 
>>> hello = soup.find(text='Name: ') 
>>> hello.next 
u'Hello world'

下一个和以前让你通过他们解析器处理的顺序文档元素移动，而同级中的方法解析树

它没有返回。 hello = soup.find（text ='Name：'） hello.next – ready 2011-05-14 02:35:07

'Name：'是否出现在文档的其他地方？ – 2011-05-14 02:45:27

对不起，这是我的错误。现在它可以工作。 – ready 2011-05-14 03:04:27

答

工作contents运营商很适合从<tag>text</tag>中提取text。

<td>My home address</td>例如：

s = '<td>My home address</td>' 
soup = BeautifulSoup(s) 
td = soup.find('td') #<td>My home address</td> 
td.contents #My home address

<td><b>Address:</b></td>例如：

s = '<td><b>Address:</b></td>' 
soup = BeautifulSoup(s) 
td = soup.find('td').find('b') #<b>Address:</b> 
td.contents #Address:

Beautifulsoup =提取内容中的内容

相关推荐