Python的Beautifulsoup不能从标签内容有隐藏属性
问题描述:
<a id="ember1601" role="button" href="/carsearch/book?piid=AQAQAQRRg2INmYAyjZmAMwmKOGATj2qoYBQANIAVCeAZgB6fUEsAED&totalPriceShown=71.66&searchKey=-575257062&offerQualifiers=GreatDeal" data-book-button="book-EY-EC-Car" target="_self" class="ember-view btn btn-secondary btn-action"><span class="btn-label">
<span aria-hidden="true">
<span class="visuallyhidden">
Reserve Item 1, Economy from Economy Rent a Car Rental Company at $72 total
</span>Reserve
</span>
</span>
</a>
您好,我是新来的蟒蛇 我无法得到的价格& 72下<span class="visuallyhidden">
,还我怎样才能在标签<a>
的href链接第一行,请帮助,顺便说一句,谢谢 ,我使用美丽的库,如果其他库可以帮助,请让我知道。感谢Python的Beautifulsoup不能从标签内容有隐藏属性
答
In [9]: soup = BeautifulSoup(html, 'lxml') # html is the code you posted
In [10]: soup.find("span", class_="visuallyhidden").text
Out[10]: '\n Reserve Item 1, Economy from Economy Rent a Car Rental Company at $72 total\n '
In [11]: soup.a["href"]
Out[11]: '/carsearch/book?piid=AQAQAQRRg2INmYAyjZmAMwmKOGATj2qoYBQANIAVCeAZgB6fUEsAED&totalPriceShown=71.66&searchKey=-575257062&offerQualifiers=GreatDeal'
如果你需要从字符串中提取一部分文本,你需要使用正则表达式:
In [12]: text = soup.find("span", class_="visuallyhidden").text
In [15]: re.search(r'\$\d+', text).group()
Out[15]: '$72'
答
beautifulsoup能够通过它的类名找到一个标签一样,
bs_obj = BeautifulSoup(html)
tag = bs_obj.find("span", class_ = "visuallyhidden") # string "class" is reserved for python itself,so bs use string "class_"
s = tag.string # that will get string inside the span
...
# you can get "$72" by regx
另外,BS允许您通过 “[]” operator.Just访问标签的ATTR像
print(tag['href'])
您可以在线查看bs doc中的一些简单示例。
+0
谢谢为您的回复,但无法获取文本,可能是网站的爬虫?请帮助检查网站:https://www.orbitz.com/carsearch?date1=03%2F08%2F2017&date2=3%2F09%2F2017&loc2=lax&locn=lax&rdus=10&selCC=%5B%22economy%22%5D&vend=谢谢 –
感谢您的回复,但无法看到文字,可能是网站的block crawler?请帮助检查网站:https://www.orbitz.com/carsearch?date1=03%2F08%2F2017&date2=3%2F09%2F2017&loc2=lax&locn=lax&rdus=10&selCC=%5B%22economy%22%5D&vend= –