阅读Excel xml到字典

问题描述：

我想读简单的excel xml文件到字典。我试图使用xlrd 7.1，但它返回格式错误。现在我试图使用xml.etree.ElementTree，也没有成功。我无法更改.xml文件的结构。在这里我的代码：阅读Excel xml到字典

<?xml version="1.0" encoding="UTF-8"?> 
-<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:html="http://www.w3.org/TR/REC-html40"> 
    -<Styles> 
    -<Style ss:Name="Normal" ss:ID="Default"> 
     <Alignment ss:Vertical="Bottom"/> 
     <Borders/> 
     <Font ss:FontName="Verdana"/> 
     <Interior/> 
     <NumberFormat/> 
     <Protection/> 
    </Style> -<Style ss:ID="s22"> 
     <NumberFormat ss:Format="General Date"/> 
    </Style> 
    </Styles> -<Worksheet ss:Name="Linkfeed"> 
    -<Table> 
     -<Row> 
     -<Cell> 
      <Data ss:Type="String">ID</Data> 
     </Cell> -<Cell> 
      <Data ss:Type="String">URL</Data> 
     </Cell> 
     </Row> -<Row> 
     -<Cell> 
      <Data ss:Type="String">22222</Data> 
     </Cell> -<Cell> 
      <Data ss:Type="String">Hello there</Data> 
     </Cell> 
     </Row> 
    </Table> 
    </Worksheet> 
</Workbook>

阅读：

import xml.etree.cElementTree as etree 

def xml_to_list(fname): 
     with open(fname) as xml_file: 
       tree = etree.parse(xml_file) 

       for items in tree.getiterator(tag="Table"): 
         for item in items: # Items is None! 
           print item.text

更新，现在它的工作原理，但如何排除垃圾？

def xml_to_list(fname): 
     with open(fname) as xml_file: 
       tree = etree.iterparse(xml_file) 
       for item in tree: 
         print item[1].text

什么 “垃圾” 你在说什么？ – Constantinius

树中的空项目 – User

对不起，我仍然无法找到你的问题。也许你可以澄清什么是错的。我无法找到任何语法错误，并且您使用'etree'似乎也是正确的。 – Constantinius

答

排除 “垃圾” 与if语句：

def xml_to_list(fname): 
    with open(fname) as xml_file: 
      tree = etree.iterparse(xml_file) 
      for item in tree: 
       if item[1].text.strip() != '-': 
         print item[1].text

谢谢，做到了。如果我在分析之前清理原始xml会怎么样？ – User

我想添加额外的支票if item[1].text and item[1].text.strip() != '-': –

阅读Excel xml到字典

相关推荐