SAX解析器忽略CDATA - html标签

问题描述：

我有一个简单的Android RSS阅读器应用程序，我在其中使用SAX解析器来获取数据。除“desc”元素外，所有记录都正确提取。 XML结构如下。SAX解析器忽略CDATA - html标签

<item> 
<title>Boilermaker Jazz Band</title> 
<link>http://eventur.sis.pitt.edu/event.jsp?e_id=1805</link> 
<type>Music Concerts</type> 
<s_time>09-02-2010 05:00 PM&nbsp;</s_time> 
<venue>Backstage Bar at Theater Square</venue> 
<venue_addr/> 
<desc> 
<p><span style="font-family: arial, geneva, sans-serif; font-size: 11px;"> 
<p style="font-family: Arial, Helvetica, sans-serif; max-width: 600px; margin-top: 8px; margin-right: 0px; margin-bottom: 8px; margin-left: 0px; font-size: 9pt; vertical-align: top;">Authentic American Jazz, Ragtime and Swing The Boilermaker Jazz Band is an ecstatically fun band performing authentic hot jazz, ragtime, and swing. The group has ....</desc> 
− 
<img_link> 
http://eventur.sis.pitt.edu/images/Boilheadshot1.jpg 
</img_link> 
</item>

来自所有字段的数据是作为整体提取的。但是当涉及到<desc>时，“字符”方法仅提取“<”并忽略其余部分。请有人建议可以做些什么。

答

您的<desc>元素包含另一个（无效的）XML结构。在你的例子中，startElement()将触发<p>，然后<span>，然后再触发<p>。如果您只想提取文本，则可以连接characters()方法为<desc>的所有子项返回的内容，直到<desc>元素的末尾通知endElement()。

喜欢的东西

private boolean isDescStarted = false; 

private StringBuilder textDesc = new StringBuilder(); 

public void startElement(String uri, String name, String qName, Attributes atts) { 
    if(name.equals("desc") {isDescStarted = true;} 
} 

public void endElement(String uri, String name, String qName) { 
    if(name.equals("desc") { 
     isDescStarted = false; 
     String fullTextDesc = textDesc.toString(); // do whatever you want with this string now 
    } 
} 

public void characters(char[] buf, int offset, int length) { 
    if (isDescStarted) { 
     textDesc.append(new String(buf, offset, length)); 
    } 
}

达你好，我想通了，你刚才说。但我不确定如何使用标记的characters（）和endElement（）。你能否详细说明一下。会真的感谢你。 – 2010-09-05 14:38:27

@Abdul我编辑了我的答案，添加了代码片段。希望能帮助到你。 – Damien 2010-09-05 18:01:50

非常感谢你Damien ... – 2010-09-06 08:50:34

SAX解析器忽略CDATA - html标签

相关推荐