使用熊猫阅读html文件,Python
问题描述:
我喜欢阅读pandas中的.html文件,请参阅下面的源代码htm。使用熊猫阅读html文件,Python
<html>
<head>
<title>Output File</title>
</head>
<body>
<pre>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| Study Case: Case A_Lines | Annex: /1 |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| System Summary |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'>| System Average Interruption Frequency Index : SAIFI = 0.373016 1/Ca |</span>
<span style='color:black'>| Customer Average Interruption Frequency Index : CAIFI = 0.373016 1/Ca |</span>
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span>
<span style='color:black'></span>
</pre>
</body>
</html>
,我想读最相关的信息索引和值,如表,
SAIFI 0.373016 1/Ca
我曾尝试直接与期权数量的阅读,但没有成功。
df = pd.read_html(path, match='=')
请帮忙!
答
我试过pandas
,但它返回一个错误。你可以尝试BeautifulSoap
?
In [20]: from bs4 import BeautifulSoup
In [21]: f = BeautifulSoup(open("file.html"))
In [22]: f.findAll("span")[5].text.split()[-3]
Out[22]: u'0.373016'
当然你也可以提高我用于识别价值的方式。
谢谢
谢谢!有用。 – user56579
欢迎您,如果您感到高兴,请接受答案:) – Alberto