用xml中的特定匹配字符串使用python解析子标记
问题描述:
我想解析xml字符串,其中标记主题作为父标记,而Topic1,Topic2作为子标记。用xml中的特定匹配字符串使用python解析子标记
<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments>
我只是想分析该XML,这样我可以得到每一个主题标签的属性值,我只是希望它是在for循环。
我曾尝试用下面的代码:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='sample.xml')
#get the root element
root = tree.getroot()
namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}
for devs in root.findall('xmlns:Topics' ,namespace):
for child_tags in devs.findall('xmlns:./', namespace):
print 'child: ', child_tags.tag
我只想补充一些外卡像主题/ d的倒数第二行,这样我可以解析每个标签匹配主题
答
你可以检查tag
属性与空间加上前缀Topic
开始,例如
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
print (topic.text)
或更短为
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory/Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation/Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership/Control</ParentTopic2></Topics></SignificantDevelopments>')
for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
print (topic.text)
或者在for
语句中将支票存入if
语句中。
谢谢你这么多,它的工作! :)可以请你建议我一些来源,我可以建立我的Python技能。 – ggupta
http://stackoverflow.com/tags/python/info有教程,课程和文章的链接丢失列表。 –
感谢吨 – ggupta