Groovys XmlParser忽略CDATA CR/CL

问题描述:

我想解析一个log4j生成的xml日志。在xml中是一个带有throwable的节点(如果有的话)。这个(多行,标签)文本被封装在一个CDATA标签中。Groovys XmlParser忽略CDATA CR/CL

这是整个文件的摘录:

<log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1"> 
<log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show 
Stacktrace follows:]]></log4j:message> 
<log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. 
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) 
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) 
    at test.LogController$_closure2.doCall(LogController.groovy:21) 
    at test.LogController$_closure2.doCall(LogController.groovy) 
    at java.lang.Thread.run(Thread.java:662) 
]]></log4j:throwable> 
</log4j:event> 

我groovys XmlParser的解析它:

def parser = new XmlParser(false, false).parse(new File("stack.log")) 

return parser.'log4j:event'.collect { l -> 
    LogEntry entry = new LogEntry() 
    entry.with { 
     level = l.'@level' 
     message = l.'log4j:message'.text() 
     thread = l.'@thread' 
     logger = l.'@logger' 
     timestamp = new Date(l.'@timestamp' as long) 
     throwable = l.'log4j:throwable'?.text() ?: '' 
    } 
    entry 
} 

的 '抛出' 字段包含的所有文字,但没有CR/LF。

有人知道如何应对吗?

谢谢您的优先...

+0

你有没有一点XML的例子? – 2012-02-24 14:47:33

+0

我编辑帖子以显示一个小例子... – matcauthon 2012-02-24 16:27:07

讨厌就在你扔的代码,但似乎按预期方式工作,并返回CRLFs

def xml = '''<log> 
      | <log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1"> 
      | <log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show 
      |Stacktrace follows:]]></log4j:message> 
      | <log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. 
      | at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) 
      | at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) 
      | at test.LogController$_closure2.doCall(LogController.groovy:21) 
      | at test.LogController$_closure2.doCall(LogController.groovy) 
      | at java.lang.Thread.run(Thread.java:662) 
      |]]></log4j:throwable> 
      | </log4j:event> 
      |</log>'''.stripMargin() 


class LogEntry { 
    def level 
    def message 
    def thread 
    def logger 
    def timestamp 
    def throwable 

    String toString() { 
    """EVENT: 
     | level : $level 
     | message : $message 
     | thread : $thread 
     | logger : $logger 
     | ts  : $timestamp 
     | thrown : $throwable""".stripMargin() 
    } 
} 

def parser = new XmlParser(false, false).parseText(xml) 
def entries = parser.'log4j:event'.collect { event -> 
    new LogEntry().with { 
    level  = [email protected] 
    message = event.'log4j:message'.text() 
    thread = [email protected] 
    logger = [email protected] 
    timestamp = new Date([email protected] as long) 
    throwable = event.'log4j:throwable'?.text() ?: '' 
    it 
    } 
} 

entries.each { 
    println it 
} 

,打印:

EVENT: 
    level : ERROR 
    message : Exception occurred when processing request: [GET] /test/log/show 
Stacktrace follows: 
    thread : http-8080-1 
    logger : org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver 
    ts  : Fri Feb 24 11:45:21 GMT 2012 
    thrown : org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. 
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) 
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) 
    at test.LogController$_closure2.doCall(LogController.groovy:21) 
    at test.LogController$_closure2.doCall(LogController.groovy) 
    at java.lang.Thread.run(Thread.java:662) 

其中在它具有CRLF字符,他们都应该是...

这是与Groovy 1.8.6 btw ...你使用什么版本?你可以升级并重试吗?

+0

嗯。 Yepp,我正在使用1.7.10(在Grails上)。使用1.8.6进行测试,它按预期工作。 – matcauthon 2012-02-25 11:56:57

+0

好的。看来,我的控制器和视图之间我错过了翻译标签等... – matcauthon 2012-02-28 12:46:46

xml标准要求在解析过程中将空白区域标准化。

我不确定,但解析器可能有一个设置来覆盖此行为。否则,您可以预处理文件,用它们的xml实体替换c数据部分中的行结尾,然后解析它。

+0

-1。除了属性外,XML标准不会调用要标准化的空白。 – lavinio 2012-02-24 15:14:28