通过python lxml解析xml tree.xpath
问题描述:
我尝试解析一个巨大的文件。样本如下。我尽量采取<Name>
,但我不能 它的工作原理只是没有这个字符串通过python lxml解析xml tree.xpath
<LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
xml2 = '''<?xml version="1.0" encoding="UTF-8"?>
<PackageLevelLayout>
<LevelLayouts>
<LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432">
<LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<LevelLayoutSectionBase>
<LevelLayoutItemBase>
<Name>Tracking ID</Name>
</LevelLayoutItemBase>
</LevelLayoutSectionBase>
</LevelLayout>
</LevelLayout>
</LevelLayouts>
</PackageLevelLayout>'''
from lxml import etree
tree = etree.XML(xml2)
nodes = tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/LevelLayout/LevelLayoutSectionBase/LevelLayoutItemBase/Name')
print nodes
答
你嵌套LevelLayout
XML文档使用命名空间。我会使用:
tree.xpath('.//LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]//*[local-name()="Name"]')
到Name
元件具有较短XPath表达式(完全忽略的命名空间)相匹配。
另一种方法是使用一个前缀至名称空间的映射,并使用那些在您的标签:
nsmap = {'acd': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain'}
tree.xpath('/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]/acd:LevelLayout/acd:LevelLayoutSectionBase/acd:LevelLayoutItemBase/acd:Name',
namespaces=nsmap)
答
lxml
的xpath
方法具有namespaces
parameter。您可以将它传递一个字典映射名称空间前缀给名称空间。然后,你可以参考建立XPath
s表示使用的命名空间前缀:
xml2 = '''<?xml version="1.0" encoding="UTF-8"?>
<PackageLevelLayout>
<LevelLayouts>
<LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432">
<LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<LevelLayoutSectionBase>
<LevelLayoutItemBase>
<Name>Tracking ID</Name>
</LevelLayoutItemBase>
</LevelLayoutSectionBase>
</LevelLayout>
</LevelLayout>
</LevelLayouts>
</PackageLevelLayout>'''
namespaces={'ns': 'http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain',
'i': 'http://www.w3.org/2001/XMLSchema-instance'}
import lxml.etree as ET
# This is an lxml.etree._Element, not a tree, so don't call it tree
root = ET.XML(xml2)
nodes = root.xpath(
'''/PackageLevelLayout/LevelLayouts/LevelLayout[@levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"]
/ns:LevelLayout/ns:LevelLayoutSectionBase/ns:LevelLayoutItemBase/ns:Name''', namespaces = namespaces)
print nodes
产生
[<Element {http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain}Name at 0xb74974dc>]
非常感谢您!似乎我需要更深入地学习xpath。 – user2200260 2013-03-26 07:47:27