使用熊猫阅读html文件,Python

问题描述:

我喜欢阅读pandas中的.html文件,请参阅下面的源代码htm。使用熊猫阅读html文件,Python

<html> 
<head> 
<title>Output File</title> 
</head> 
<body> 
<pre> 
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span> 
<span style='color:black'>| Study Case: Case A_Lines                   | Annex:    /1 |</span> 
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span> 
<span style='color:black'>| System Summary                             |</span> 
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span> 
<span style='color:black'>| System Average Interruption Frequency Index   : SAIFI = 0.373016 1/Ca            |</span> 
<span style='color:black'>| Customer Average Interruption Frequency Index  : CAIFI = 0.373016 1/Ca            |</span> 
<span style='color:black'>-----------------------------------------------------------------------------------------------------------------------------------</span> 
<span style='color:black'></span> 
</pre> 
</body> 
</html> 

,我想读最相关的信息索引和值,如表,

SAIFI 0.373016 1/Ca 

我曾尝试直接与期权数量的阅读,但没有成功。

df = pd.read_html(path, match='=') 

请帮忙!

我试过pandas,但它返回一个错误。你可以尝试BeautifulSoap

In [20]: from bs4 import BeautifulSoup 
In [21]: f = BeautifulSoup(open("file.html")) 
In [22]: f.findAll("span")[5].text.split()[-3] 
Out[22]: u'0.373016' 

当然你也可以提高我用于识别价值的方式。

谢谢

+0

谢谢!有用。 – user56579

+0

欢迎您,如果您感到高兴,请接受答案:) – Alberto