Groovys XmlParser忽略CDATA CR/CL
我想解析一个log4j生成的xml日志。在xml中是一个带有throwable的节点(如果有的话)。这个(多行,标签)文本被封装在一个CDATA标签中。Groovys XmlParser忽略CDATA CR/CL
这是整个文件的摘录:
<log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1">
<log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show
Stacktrace follows:]]></log4j:message>
<log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at test.LogController$_closure2.doCall(LogController.groovy:21)
at test.LogController$_closure2.doCall(LogController.groovy)
at java.lang.Thread.run(Thread.java:662)
]]></log4j:throwable>
</log4j:event>
我groovys XmlParser的解析它:
def parser = new XmlParser(false, false).parse(new File("stack.log"))
return parser.'log4j:event'.collect { l ->
LogEntry entry = new LogEntry()
entry.with {
level = l.'@level'
message = l.'log4j:message'.text()
thread = l.'@thread'
logger = l.'@logger'
timestamp = new Date(l.'@timestamp' as long)
throwable = l.'log4j:throwable'?.text() ?: ''
}
entry
}
的 '抛出' 字段包含的所有文字,但没有CR/LF。
有人知道如何应对吗?
谢谢您的优先...
讨厌就在你扔的代码,但似乎按预期方式工作,并返回CRLFs
def xml = '''<log>
| <log4j:event logger="org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver" timestamp="1330083921521" level="ERROR" thread="http-8080-1">
| <log4j:message><![CDATA[Exception occurred when processing request: [GET] /test/log/show
|Stacktrace follows:]]></log4j:message>
| <log4j:throwable><![CDATA[org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
| at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
| at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
| at test.LogController$_closure2.doCall(LogController.groovy:21)
| at test.LogController$_closure2.doCall(LogController.groovy)
| at java.lang.Thread.run(Thread.java:662)
|]]></log4j:throwable>
| </log4j:event>
|</log>'''.stripMargin()
class LogEntry {
def level
def message
def thread
def logger
def timestamp
def throwable
String toString() {
"""EVENT:
| level : $level
| message : $message
| thread : $thread
| logger : $logger
| ts : $timestamp
| thrown : $throwable""".stripMargin()
}
}
def parser = new XmlParser(false, false).parseText(xml)
def entries = parser.'log4j:event'.collect { event ->
new LogEntry().with {
level = [email protected]
message = event.'log4j:message'.text()
thread = [email protected]
logger = [email protected]
timestamp = new Date([email protected] as long)
throwable = event.'log4j:throwable'?.text() ?: ''
it
}
}
entries.each {
println it
}
,打印:
EVENT:
level : ERROR
message : Exception occurred when processing request: [GET] /test/log/show
Stacktrace follows:
thread : http-8080-1
logger : org.codehaus.groovy.grails.web.errors.GrailsExceptionResolver
ts : Fri Feb 24 11:45:21 GMT 2012
thrown : org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at test.LogController$_closure2.doCall(LogController.groovy:21)
at test.LogController$_closure2.doCall(LogController.groovy)
at java.lang.Thread.run(Thread.java:662)
其中在它具有CRLF字符,他们都应该是...
这是与Groovy 1.8.6 btw ...你使用什么版本?你可以升级并重试吗?
嗯。 Yepp,我正在使用1.7.10(在Grails上)。使用1.8.6进行测试,它按预期工作。 – matcauthon 2012-02-25 11:56:57
好的。看来,我的控制器和视图之间我错过了翻译标签等... – matcauthon 2012-02-28 12:46:46
xml标准要求在解析过程中将空白区域标准化。
我不确定,但解析器可能有一个设置来覆盖此行为。否则,您可以预处理文件,用它们的xml实体替换c数据部分中的行结尾,然后解析它。
-1。除了属性外,XML标准不会调用要标准化的空白。 – lavinio 2012-02-24 15:14:28
你有没有一点XML的例子? – 2012-02-24 14:47:33
我编辑帖子以显示一个小例子... – matcauthon 2012-02-24 16:27:07