解析XML文件,并创建文件
问题描述:
的名单有一个info.xml文件下的每个在/ var /包/ {很多文件夹} /info.xml哪里都是不同的目录,但在info.xml显示目录的信息解析XML文件,并创建文件
如果文件类型为“config”,可以通过检查“config”是类型标签内的类型来找到,我需要解析每个{许多文件夹}并创建一个Path标签内的文件路径列表。
的info.xml文件西港岛线是这样的,
<Files>
<File>
<Path>usr/share/doc/dialog/samples/form1</Path>
<Type>doc</Type>
<Size>1222</Size>
<Uid>0</Uid>
<Gid>0</Gid>
<Mode>0755</Mode>
<Hash>49744d73e8667d0e353923c0241891d46ebb9032</Hash>
</File>
<File>
<Path>usr/share/doc/dialog/samples/form3</Path>
<Type>config</Type>
<Size>1294</Size>
<Uid>0</Uid>
<Gid>0</Gid>
<Mode>0755</Mode>
<Hash>f30277f73e468232c59a526baf3a5ce49519b959</Hash>
</File>
</Files>
答
这里是没有错误的处理非常简单的例子,用非常严格的定义XML文件的工作,但你应该把它作为开始,并继续与以下链接:
- http://docs.python.org/library/xml.dom.html
- http://docs.python.org/library/xml.dom.minidom.html
- http://docs.python.org/library/os.path.html
- http://docs.python.org/library/os.html
代码:
import os
import os.path
from xml.dom.minidom import parse
def parse_file(path):
files = []
try:
dom = parse(path)
for filetag in dom.getElementsByTagName('File'):
type = filetag.getElementsByTagName('Type')[0].firstChild.data
if type == 'config':
path = tag.getElementsByTagName('Path')[0].firstChild.data
files.append(path)
dom.unlink()
except:
raise
return files
def main():
files = []
for root, dirs, files in os.walk('/var/packs'):
if 'info.xml' in files:
files += parse_file(os.path.join(root, 'info.xml'))
print 'The list of desired files:', files
if __name__ == '__main__':
main()
答
写作这一关我的头顶部,但在这里不用。我们将利用os.path.walk递归地下降到您的目录和minidom进行解析。
import os
from xml.dom import minidom
# opens a given info.xml file and prints out "Path"'s contents
def parseInfoXML(filename):
doc = minidom.parse(filename)
for fileNode in doc.getElementsByTagName("File"):
# warning: we assume the existence of a Path node, and that it contains a Text node
print fileNode.getElementsByTagName("Path")[0].childNodes[0].data
doc.unlink()
def checkDirForInfoXML(arg, dirname, names):
if "info.xml" in names:
parseInfoXML(os.path.join(dirname, "info.xml"))
# recursively walk the directory tree, calling our visitor function to check for info.xml in each dir
# this will include packs as well, so be sure that there's no info.xml in there
os.path.walk("/var/packs" , checkDirForInfoXML, None)
不是最有效的方式来完成它,我敢肯定,但如果你不希望任何错误/不管它会做。
答
使用lxml.etree和XPath:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = lxml.etree.parse(os.path.join(root, filename))
files.extend(tree.getroot().xpath('//File[Type[text()="config"]]/Path/text()'))
如果LXML不可用,则可以选择使用etree API标准库:
files = []
for root, dirnames, filenames in os.walk('/var/packs'):
for filename in filenames:
if filename != 'info.xml':
continue
tree = xml.etree.ElementTree.parse(os.path.join(root, filename))
for file_node in tree.findall('File'):
type_node = file_node.find('Type')
if type_node is not None and type_node.text == 'config':
path_node = file_node.find('Path')
if path_node is not None:
files.append(path_node.text)
只是一个侧面说明:os.path.walk是已弃用,并已在3.0中删除,以支持os.walk()。 http://docs.python.org/library/os.path.html#os.path.walk – dmedvinsky 2010-07-06 19:38:48
啊哈,谢谢。不幸的是,我仍然生活在Python 2.6石器时代,嘿。 – Faisal 2010-07-07 04:04:28