当我读到使用beautifulsoup标签我总是无
问题描述:
我想读一些名称和ID,如:当我读到使用beautifulsoup标签我总是无
<a class="inst" href="loader.aspx?ParTree=151311&i=3823243780502959"
target="3823243780502959">رتكو</a>
i = 3823243780502959
等,从tsetmc.com
。这是我的代码:
import requests
from bs4 import BeautifulSoup
url = 'http://www.tsetmc.com/Loader.aspx?ParTree=15131F'
page = requests.get(url)
soup = BeautifulSoup(page.content , 'html.parser')
first_names_Id = soup.find_all('a',class_='isnt')
print (first_names_Id)
但它返回None
。
如何读取这些标签?我有与其他标签相同的问题。
答
我使用Selenium而不是请求访问解析所需的网站,它给了我你想要的结果。
我相信为什么请求库没有返回的HTML响应,硒库的原因是因为你想要的网站解析呈现用JavaScript
另外请注意,您必须在class属性值一个错字,它应该是'inst'而不是'isnt'。
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url = 'http://www.tsetmc.com/Loader.aspx?ParTree=15131F'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
first_names_Id = soup.findAll('a', {'class': 'inst'})
print(first_names_Id)
输出:
[<a class="inst" href="loader.aspx?ParTree=151311&i=33541897671561960" target="33541897671561960">واتي</a>, <a class="inst" href="loader.aspx?ParTree=151311&i=33541897671561960" target="33541897671561960">سرمايه گذاري آتيه دماوند</a>, <a class="inst" href="loader.aspx?ParTree=151311&i=9093654036027968" target="9093654036027968">طپنا7002</a>, <a class="inst" href="loader.aspx?ParTree=151311&i=9093654036027968" target="9093654036027968">اختيارف رمپنا-7840-19/07/1396</a>, <a class="inst" href="loader.aspx?ParTree=151311&i=19004627894176375" target="19004627894176375">طپنا7003</a>, <a class="inst" href="loader.aspx?ParTree=151311&i=19004627894176375" target="19004627894176375">اختيارف رمپنا-8340-19/07/1396</a>, **etc**]