用美丽的汤解析。查找字符串(数字)span标签外
问题描述:
我已经成功与beautifulsoup来分析以下数据:用美丽的汤解析。查找字符串(数字)span标签外
<span class="price-currency">$</span>200.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>1,000.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>1,300.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>550.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">
<span class="price-currency">$</span>450.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">
<span class="price-currency">$</span>50.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">
现在我需要在每一行的中间来解析号码。 我认为它会使用nextSibling,但是失败了。 我还注意到,一些数字后面跟着近距离标签,一些数字后面跟着开阔的范围标签。
我如何用美丽的脸颊分析这些数字? 这是我得到了上面的数据:
span = soup("span", { "class" : "price-currency" })
感谢
答
如果数据完全如您所示,那么得到.next_sibling
适用于我:
In [1]: from bs4 import BeautifulSoup
In [2]: data = """
...: <span class="price-currency">$</span>200.00</span>, <span class="j-original-price">
...: <span class="price-currency">$</span>1,000.00</span>, <span class="j-original-price">
...: <span class="price-currency">$</span>1,300.00</span>, <span class="j-original-price">
...: <span class="price-currency">$</span>550.00 <span class="price-type price-type--negotiable">N
...: egotiable</span></span>, <span class="j-original-price">
...: <span class="price-currency">$</span>450.00 <span class="price-type price-type--negotiable">N
...: egotiable</span></span>, <span class="j-original-price">
...: <span class="price-currency">$</span>50.00 <span class="price-type price-type--negotiable">Ne
...: gotiable</span></span>, <span class="j-original-price">
...: """
In [3]: soup = BeautifulSoup(data, "html.parser")
In [4]: for item in soup("span", {"class": "price-currency"}):
...: print(item.next_sibling)
...:
200.00
1,000.00
1,300.00
550.00
450.00
50.00
答
尝试通过您的数据循环,并从soup
[s.extract() for s in soup("span", {"class":"price-currency"})]
提取.price-currency
标记,然后从获取所需的货币值:
list_price = soup("span", {"class":"j-original-price"})
print [pr.text for pr in list_price]
这对我也有效,谢谢!我愚蠢地没有迭代,并且对使用next_sibling有点困惑。 –