用BeautifulSoup无法计数空标签?
问题描述:
我已经有一个look对其他问题进行了汇总,但我找不到任何东西。用BeautifulSoup无法计数空标签?
我的HTML看起来像
<div class="rating-input">
<i data-value="1" class="rating-active-star"></i>
<i data-value="2" class="rating-active-star"></i>
<i data-value="3" class="rating-active-star"></i>
<i data-value="4" class="rating-active-star"></i>
<i data-value="5" class="rating-inactive-star"></i>
</div>
而且我行多数民众赞成失败是这样的:
details = [{"name": film.select('h2')[0].text.split('\n')[0],
"rating":len(film.select('div i.rating-inactive-star'))}
for film in detail_row]
因为它带回了这一点:
[{'name': 'The LEGO Batman Movie', 'rating': 0},
{'name': 'Sing', 'rating': 0},
{'name': 'John Wick: Chapter 2', 'rating': 0},
{'name': 'Fifty Shades Darker', 'rating': 0},
{'name': 'The Great Wall', 'rating': 0},
{'name': 'Hidden Figures', 'rating': 0},
{'name': 'La La Land', 'rating': 0},
{'name': 'The Founder', 'rating': 0},
{'name': 'Hacksaw Ridge', 'rating': 0},
{'name': 'T2 Trainspotting', 'rating': 0},
{'name': 'Split', 'rating': 0},
{'name': 'Patriots Day', 'rating': 0}
]
,所有的收视率都为零。我期望的是i
元素的数量与类rating-active-star
(例如4,为上述html)。
凡为我的等级选择改变从'div i.rating-active-star'
到'div i'
所有'rating': 0
成为'rating': 5
这里是我的整个脚本(或多或少的MVP):
import requests
import bs4
data = "si=1010841&sort=cin&max=0&bd=2017-02-23&css=cat-&mod=cinemapage_movie_list&attrs=2D%2C3D%2CIMAX%2CViP%2CVIP%2CDBOX%2C4DX%2CM4J%2CSS"
data_list = data.split('&')
info = {item[0]:item[1] for item in [elem.split('=') for elem in data_list]}
response = requests.post('https://www.cineworld.co.uk/pgm-list-byfeat',info)
soup = bs4.BeautifulSoup(response.text, "html.parser")
detail_row = soup.select('div[id^=film_] div.row div.col-sm-10')
details = [{"name": film.select('h2')[0].text.split('\n')[0],
"rating":len(film.select('div i.rating-active-star'))}
for film in detail_row]
为什么一个列表的长度空标签的长度与非空标签的长度有什么不同?我该如何解决这个问题?
答
问题可能在其他地方。此片段似乎按预期工作:
from bs4 import BeautifulSoup
html = '''
<div class="rating-input">
<i data-value="1" class="rating-active-star"></i>
<i data-value="2" class="rating-active-star"></i>
<i data-value="3" class="rating-active-star"></i>
<i data-value="4" class="rating-active-star"></i>
<i data-value="5" class="rating-inactive-star"></i>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
print (len(soup.select('div i.rating-inactive-star')),
len(soup.select('div i.rating-active-star')))
+0
我可以确认你的脚本在我的机器上工作。 – Pureferret
我很困惑。评级标签中没有值。所以'len(film.select('div i.rating-active-star')))'是0并且len(film.select('div i')))是5.你期望看到什么? – Batman
@Batman具有该类的标签数量。我假设为空!= null。那是错的吗? – Pureferret
相关:http://stackoverflow.com/q/12336968/1075247除了我想获得一个类不是文本 – Pureferret