通过python lxml解析xml tree.xpath

问题描述:

我尝试解析一个巨大的文件。样本如下。我尽量采取<Name>,但我不能 它的工作原理只是没有这个字符串通过python lxml解析xml tree.xpath

<LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> 

 

xml2 = '''<?xml version="1.0" encoding="UTF-8"?> 
<PackageLevelLayout> 
<LevelLayouts> 
    <LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"> 
       <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> 
        <LevelLayoutSectionBase> 
         <LevelLayoutItemBase> 
          <Name>Tracking ID</Name> 
         </LevelLayoutItemBase> 
        </LevelLayoutSectionBase> 
       </LevelLayout> 
      </LevelLayout> 
    </LevelLayouts> 
</PackageLevelLayout>''' 

from lxml import etree 
tree = etree.XML(xml2) 
nodes = tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/LevelLayout/LevelLayoutSectionBase/LevelLayoutItemBase/Name') 
print nodes 

你嵌套LevelLayout XML文档使用命名空间。我会使用:

tree.xpath('.//LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]//*[local-name()="Name"]') 

Name元件具有较短XPath表达式(完全忽略的命名空间)相匹配。

另一种方法是使用一个前缀至名称空间的映射,并使用那些在您的标签:

nsmap = {'acd': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain'} 

tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/acd:LevelLayout/acd:LevelLayoutSectionBase/acd:LevelLayoutItemBase/acd:Name', 
    namespaces=nsmap) 
+0

非常感谢您!似乎我需要更深入地学习xpath。 – user2200260 2013-03-26 07:47:27

lxmlxpath方法具有namespaces parameter。您可以将它传递一个字典映射名称空间前缀给名称空间。然后,你可以参考建立XPath s表示使用的命名空间前缀:

xml2 = '''<?xml version="1.0" encoding="UTF-8"?> 
<PackageLevelLayout> 
<LevelLayouts> 
    <LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"> 
       <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> 
        <LevelLayoutSectionBase> 
         <LevelLayoutItemBase> 
          <Name>Tracking ID</Name> 
         </LevelLayoutItemBase> 
        </LevelLayoutSectionBase> 
       </LevelLayout> 
      </LevelLayout> 
    </LevelLayouts> 
</PackageLevelLayout>''' 

namespaces={'ns': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain', 
      'i': 'http://www.w3.org/2001/XMLSchema-instance'} 

import lxml.etree as ET 
# This is an lxml.etree._Element, not a tree, so don't call it tree 
root = ET.XML(xml2) 

nodes = root.xpath(
    '''/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"] 
     /ns:LevelLayout/ns:LevelLayoutSectionBase/ns:LevelLayoutItemBase/ns:Name''', namespaces = namespaces) 
print nodes 

产生

[<Element {http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain}Name at 0xb74974dc>]