什么是从Python输出XML最简单的非内存密集型方法?

问题描述:

基本上,类似于System.Xml.XmlWriter的东西 - 一种流式XML编写器,不会产生太多的内存开销。这样就排除了xml.dom和xml.dom.minidom。建议?什么是从Python输出XML最简单的非内存密集型方法?

xml.etree.cElementTree,包含在2.5以来的CPython的默认分配。阅读和编写XML都闪电般快速。

+4

尽管如此,cElementTree不是流式写入器,因此会使用内存线性来创建您所创建的XML树的大小,尽管远小于xml.dom。 – 2008-09-20 19:09:54

我想我有你的毒药:

http://sourceforge.net/projects/xmlite

干杯

我一直有良好的结果lxml。这是一个痛苦的安装,因为它主要是围绕libxml2包装,但lxml.etree树对象有一个.write()方法,需要一个类文件对象流。

from lxml.etree import XML 

tree = XML('<root><a><b/></a></root>') 
tree.write(your_file_object) 

几年前,我用MarkupWriter4suite

General-purpose utility class for generating XML (may eventually be 
expanded to produce more output types) 

Sample usage: 

from Ft.Xml import MarkupWriter 
writer = MarkupWriter(indent=u"yes") 
writer.startDocument() 
writer.startElement(u'xsa') 
writer.startElement(u'vendor') 
#Element with simple text (#PCDATA) content 
writer.simpleElement(u'name', content=u'Centigrade systems') 
#Note writer.text(content) still works 
writer.simpleElement(u'email', content=u"[email protected]") 
writer.endElement(u'vendor') 
#Element with an attribute 
writer.startElement(u'product', attributes={u'id': u"100\u00B0"}) 
#Note writer.attribute(name, value, namespace=None) still works 
writer.simpleElement(u'name', content=u"100\u00B0 Server") 
#XML fragment 
writer.xmlFragment('<version>1.0</version><last-release>20030401</last-release>') 
#Empty element 
writer.simpleElement(u'changes') 
writer.endElement(u'product') 
writer.endElement(u'xsa') 
writer.endDocument() 

Note on the difference between 4Suite writers and printers 
Writer - module that exposes a broad public API for building output 
      bit by bit 
Printer - module that simply takes a DOM and creates output from it 
      as a whole, within one API invokation 

最近我听到了很多关于如何lxml是伟大的,但我没有第一手的经验,和我与gnosis工作有一些乐趣。

+0

恐怕4suite不再在线。至于lxml,这很棒,但没有一个令人窒息的设施AFAIK。 – rds 2012-01-06 14:04:11

+0

@rds 4suite仍然可以在http://pypi.python.org/pypi/4Suite-XML – 2012-01-09 17:40:21

我想你会从xml.sax.saxutils找到XMLGenerator是最接近你想要的东西。

 
import time 
from xml.sax.saxutils import XMLGenerator 
from xml.sax.xmlreader import AttributesNSImpl 

LOG_LEVELS = ['DEBUG', 'WARNING', 'ERROR'] 


class xml_logger: 
    def __init__(self, output, encoding): 
     """ 
     Set up a logger object, which takes SAX events and outputs 
     an XML log file 
     """ 
     logger = XMLGenerator(output, encoding) 
     logger.startDocument() 
     attrs = AttributesNSImpl({}, {}) 
     logger.startElementNS((None, u'log'), u'log', attrs) 
     self._logger = logger 
     self._output = output 
     self._encoding = encoding 
     return 

    def write_entry(self, level, msg): 
     """ 
     Write a log entry to the logger 
     level - the level of the entry 
     msg - the text of the entry. Must be a Unicode object 
     """ 
     #Note: in a real application, I would use ISO 8601 for the date 
     #asctime used here for simplicity 
     now = time.asctime(time.localtime()) 
     attr_vals = { 
      (None, u'date'): now, 
      (None, u'level'): LOG_LEVELS[level], 
      } 
     attr_qnames = { 
      (None, u'date'): u'date', 
      (None, u'level'): u'level', 
      } 
     attrs = AttributesNSImpl(attr_vals, attr_qnames) 
     self._logger.startElementNS((None, u'entry'), u'entry', attrs) 
     self._logger.characters(msg) 
     self._logger.endElementNS((None, u'entry'), u'entry') 
     return 

    def close(self): 
     """ 
     Clean up the logger object 
     """ 
     self._logger.endElementNS((None, u'log'), u'log') 
     self._logger.endDocument() 
     return 

if __name__ == "__main__": 
    #Test it out 
    import sys 
    xl = xml_logger(sys.stdout, 'utf-8') 
    xl.write_entry(2, u"Vanilla log entry") 
    xl.close() 

你可能会想看看本文的其余部分我是从http://www.xml.com/pub/a/2003/03/12/py-xml.html

ElementTree的第二票(cElementTree是一个C实现,有点快,就像cPickle vs pickle)。这里有一些简短的示例代码,你可以看看它们,让你知道它是如何工作的:http://effbot.org/zone/element-index.htm (这是Fredrik Lundh,他首先编写了这个模块,它非常好用2.5编写的标准库:-))