用xml中的特定匹配字符串使用python解析子标记

问题描述:

我想解析xml字符串,其中标记主题作为父标记,而Topic1,Topic2作为子标记。用xml中的特定匹配字符串使用python解析子标记

<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments> 

我只是想分析该XML,这样我可以得到每一个主题标签的属性值,我只是希望它是在for循环。

我曾尝试用下面的代码:

import xml.etree.cElementTree as ET 
    tree = ET.ElementTree(file='sample.xml') 

    #get the root element 
    root = tree.getroot() 
    namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'} 

    for devs in root.findall('xmlns:Topics' ,namespace): 
     for child_tags in devs.findall('xmlns:./', namespace): 
      print 'child: ', child_tags.tag 

我只想补充一些外卡像主题/ d的倒数第二行,这样我可以解析每个标签匹配主题

你可以检查tag属性与空间加上前缀Topic开始,例如

from xml.etree import cElementTree as ET 
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments>') 
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')] 
for topic in topics: 
    print (topic.text) 

或更短为

from xml.etree import cElementTree as ET 
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers &amp; Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments>') 

for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]: 
    print (topic.text) 

或者在for语句中将支票存入if语句中。

+0

谢谢你这么多,它的工作! :)可以请你建议我一些来源,我可以建立我的Python技能。 – ggupta

+0

http://stackoverflow.com/tags/python/info有教程,课程和文章的链接丢失列表。 –

+0

感谢吨 – ggupta