阅读Excel xml到字典
问题描述:
我想读简单的excel xml文件到字典。我试图使用xlrd 7.1
,但它返回格式错误。现在我试图使用xml.etree.ElementTree
,也没有成功。我无法更改.xml文件的结构。在这里我的代码:阅读Excel xml到字典
<?xml version="1.0" encoding="UTF-8"?>
-<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:html="http://www.w3.org/TR/REC-html40">
-<Styles>
-<Style ss:Name="Normal" ss:ID="Default">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Verdana"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style> -<Style ss:ID="s22">
<NumberFormat ss:Format="General Date"/>
</Style>
</Styles> -<Worksheet ss:Name="Linkfeed">
-<Table>
-<Row>
-<Cell>
<Data ss:Type="String">ID</Data>
</Cell> -<Cell>
<Data ss:Type="String">URL</Data>
</Cell>
</Row> -<Row>
-<Cell>
<Data ss:Type="String">22222</Data>
</Cell> -<Cell>
<Data ss:Type="String">Hello there</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
阅读:
import xml.etree.cElementTree as etree
def xml_to_list(fname):
with open(fname) as xml_file:
tree = etree.parse(xml_file)
for items in tree.getiterator(tag="Table"):
for item in items: # Items is None!
print item.text
更新,现在它的工作原理,但如何排除垃圾?
def xml_to_list(fname):
with open(fname) as xml_file:
tree = etree.iterparse(xml_file)
for item in tree:
print item[1].text
答
排除 “垃圾” 与if语句:
def xml_to_list(fname):
with open(fname) as xml_file:
tree = etree.iterparse(xml_file)
for item in tree:
if item[1].text.strip() != '-':
print item[1].text
+0
谢谢,做到了。如果我在分析之前清理原始xml会怎么样? – User
+0
我想添加额外的支票if item[1].text and item[1].text.strip() != '-':
–
什么 “垃圾” 你在说什么? – Constantinius
树中的空项目 – User
对不起,我仍然无法找到你的问题。也许你可以澄清什么是错的。我无法找到任何语法错误,并且您使用'etree'似乎也是正确的。 – Constantinius