解析Android应用程序中的RSS源

问题描述:

我正在尝试从RSS源中检索数据。我的程序运行良好,只有一个例外。 饲料具有的结构如下项目:解析Android应用程序中的RSS源

<title></title> 
<link></link> 
<description></description> 

我可以检索数据,但是当标题具有“&”字符字符串的字符之前返回停止。因此,例如,这个标题:

<title>A&amp;T To Play Four Against Bears</title> 

我只取回的“A”,当我想到要回“A &吨至打四对抗熊”。

谁能告诉我,如果我可以修改我的现有RSSReader类占的&放大器的存在:

import android.util.Log; 

进口的java.net.URL; import java.util.ArrayList; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.CharacterData; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList;

公共RSSReader类{

private static RSSReader instance = null; 

private RSSReader() { 
} 

public static RSSReader getInstance() { 
    if (instance == null) { 
     instance = new RSSReader(); 
    } 
    return instance; 
} 

public ArrayList<Story> getStories(String address) { 
    ArrayList<Story> stories = new ArrayList<Story>(); 
    try { 
     DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
     URL u = new URL(address); 
     Document doc = builder.parse(u.openStream()); 
     NodeList nodes = doc.getElementsByTagName("item"); 
     for (int i = 0; i < nodes.getLength(); i++) { 
      Element element = (Element) nodes.item(i); 
      Story currentStory = new Story(getElementValue(element, "title"), 
        getElementValue(element, "description"), 
        getElementValue(element, "link"), 
        getElementValue(element, "pubDate")); 
      stories.add(currentStory); 
     }//for 
    }//try 
    catch (Exception ex) { 
     if (ex instanceof java.net.ConnectException) { 
     } 
    } 
    return stories; 
} 

private String getCharacterDataFromElement(Element e) { 
    try { 
     Node child = e.getFirstChild(); 
     if (child instanceof CharacterData) { 
      CharacterData cd = (CharacterData) child; 
      return cd.getData(); 
     } 
    } catch (Exception ex) { 
     Log.i("myTag2", ex.toString()); 
    } 
    return ""; 
} //private String getCharacterDataFromElement 

protected float getFloat(String value) { 
    if (value != null && !value.equals("")) { 
     return Float.parseFloat(value); 
    } else { 
     return 0; 
    } 
} 

protected String getElementValue(Element parent, String label) { 
    return getCharacterDataFromElement((Element) parent.getElementsByTagName(label).item(0)); 
} 

}

如何解决这个任何想法?

+0

你能给的链接RSS订阅? – 2012-04-21 18:23:51

+0

sure:http://www.ncataggies.com/rss.dbml?db_oem_id=24500&RSS_SPORT_ID=74515&media=news – 2012-04-21 18:25:32

+0

我在想这可能是数据从标题标签中提取的方式 – 2012-04-21 18:26:26

我用我使用的解析器测试了rss feed,它像下面那样解析。 似乎它是可解析的,但正如我在评论中写道的,由于两个CDATA都被使用并且也逃脱了,所以有文本像“A & T”,但是您可以在解析xml之后替换它们。

D/*** TITLE  : A&T To Play Four Against Longwood 
D/*** DESCRIPTION: A&amp;T baseball takes a break from conference play this weekend. 
D/*** TITLE  : Wilkerson Named MEAC Rookie of the Week 
D/*** DESCRIPTION: Wilkerson was 6-for-14 for the week of April 9-15. 
D/*** TITLE  : Lights, Camera, Action 
D/*** DESCRIPTION: A&amp;T baseball set to play nationally televised game on ESPNU. 
D/*** TITLE  : Resilient Aggies Fall To USC Upstate 
D/*** DESCRIPTION: Luke Tendler extends his hitting streak to 10 games. 
D/*** TITLE  : NCCU Defeats A&T In Key Conference Matchup 
D/*** DESCRIPTION: Kelvin Freeman leads the Aggies with three hits. 

我与大家分享了rss feed解析器的大部分内容,我用它来比较与你的不同之处。

XmlPullFeedParser.java

package com.nesim.test.rssparser; 

import java.util.ArrayList; 
import java.util.List; 

import org.xmlpull.v1.XmlPullParser; 

import android.util.Log; 
import android.util.Xml; 

public class XmlPullFeedParser extends BaseFeedParser { 

    public XmlPullFeedParser(String feedUrl) { 
    super(feedUrl); 
    } 

    public List<Message> parse() { 
    List<Message> messages = null; 
    XmlPullParser parser = Xml.newPullParser(); 
    try { 
     // auto-detect the encoding from the stream 
     parser.setInput(this.getInputStream(), null); 
     int eventType = parser.getEventType(); 
     Message currentMessage = null; 
     boolean done = false; 
     while (eventType != XmlPullParser.END_DOCUMENT && !done){ 
     String name = null; 
     switch (eventType){ 
      case XmlPullParser.START_DOCUMENT: 
      messages = new ArrayList<Message>(); 
      break; 
      case XmlPullParser.START_TAG: 
      name = parser.getName(); 
      if (name.equalsIgnoreCase(ITEM)){ 
       currentMessage = new Message(); 
      } else if (currentMessage != null){ 
       if (name.equalsIgnoreCase(LINK)){ 
       currentMessage.setLink(parser.nextText()); 
       } else if (name.equalsIgnoreCase(DESCRIPTION)){ 
       currentMessage.setDescription(parser.nextText()); 
       } else if (name.equalsIgnoreCase(PUB_DATE)){ 
       currentMessage.setDate(parser.nextText()); 
       } else if (name.equalsIgnoreCase(TITLE)){ 
       currentMessage.setTitle(parser.nextText()); 
       } else if (name.equalsIgnoreCase(DATES)){ 
       currentMessage.setDates(parser.nextText()); 
       } 
      } 
      break; 
      case XmlPullParser.END_TAG: 
      name = parser.getName(); 
      if (name.equalsIgnoreCase(ITEM) && currentMessage != null){ 
       messages.add(currentMessage); 
      } else if (name.equalsIgnoreCase(CHANNEL)){ 
       done = true; 
      } 
      break; 
     } 
     eventType = parser.next(); 
     } 
    } catch (Exception e) { 
     Log.e("AndroidNews::PullFeedParser", e.getMessage(), e); 
     throw new RuntimeException(e); 
    } 
    return messages; 
    } 
} 

BaseFeedParser.java

package com.nesim.test.rssparser; 

import java.io.IOException; 
import java.io.InputStream; 
import java.net.MalformedURLException; 
import java.net.URL; 

public abstract class BaseFeedParser implements FeedParser { 

    // names of the XML tags 
    static final String CHANNEL = "channel"; 
    static final String PUB_DATE = "pubDate"; 
    static final String DESCRIPTION = "description"; 
    static final String LINK = "link"; 
    static final String TITLE = "title"; 
    static final String ITEM = "item"; 
    static final String DATES = "dates"; 
    private final URL feedUrl; 

    protected BaseFeedParser(String feedUrl){ 
    try { 
     this.feedUrl = new URL(feedUrl); 
    } catch (MalformedURLException e) { 
     throw new RuntimeException(e); 
    } 
    } 

    protected InputStream getInputStream() { 
    try { 
     return feedUrl.openConnection().getInputStream(); 
    } catch (IOException e) { 
     throw new RuntimeException(e); 
    } 
    } 
} 

FeedParser.java

package com.nesim.test.rssparser; 

import java.util.List; 

public interface FeedParser { 
    List<Message> parse(); 
} 

似乎哟你没有改变你的代码,因为我提供。如果你坚持要这样解析它,你需要首先获取xml并对其进行正确的解析。 我也给了一个类来获取XML文本在这封邮件的末尾。 请更改您的代码,尝试编写结果。

如果更改此行,您将成功。

从getStories功能删除此行:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
URL u = new URL(address); 
Document doc = builder.parse(u.openStream()); 

而是那些删除的行,添加这些:

WebRequest response = new WebRequest("http://www.ncataggies.com/rss.dbml?db_oem_id=24500&RSS_SPORT_ID=74515&media=news",PostType.GET); 
String htmltext = response.Get(); 

int firtItemIndex = htmltext.indexOf("<item>"); 
String htmltextHeader = htmltext.substring(0,firtItemIndex); 
String htmltextBody = htmltext.substring(firtItemIndex); 

htmltextBody = htmltextBody.replace("<title>", "<title><![CDATA[ "); 
htmltextBody = htmltextBody.replace("</title>", "]]></title>"); 

htmltextBody = htmltextBody.replace("<link>", "<link><![CDATA[ "); 
htmltextBody = htmltextBody.replace("</link>", "]]></link>"); 

htmltextBody = htmltextBody.replace("<guid>", "<guid><![CDATA[ "); 
htmltextBody = htmltextBody.replace("</guid>", "]]></guid>"); 
htmltextBody = htmltextBody.replace("&amp;", "&"); 
htmltext = htmltextHeader + htmltextBody; 

Document doc = XMLfunctions.XMLfromString(htmltext); 

WebRequest.java

package com.nesim.test; 

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStream; 
import java.io.InputStreamReader; 
import java.net.UnknownHostException; 
import java.nio.charset.Charset; 

import org.apache.http.HttpResponse; 
import org.apache.http.client.CookieStore; 
import org.apache.http.client.HttpClient; 
import org.apache.http.client.methods.HttpGet; 
import org.apache.http.client.methods.HttpPost; 
import org.apache.http.client.protocol.ClientContext; 
import org.apache.http.impl.client.BasicCookieStore; 
import org.apache.http.impl.client.DefaultHttpClient; 
import org.apache.http.protocol.BasicHttpContext; 
import org.apache.http.protocol.HttpContext; 


public class WebRequest { 
    public enum PostType{ 
    GET, POST; 
    } 

    public String _url; 
    public String response = ""; 
    public PostType _postType; 
    CookieStore _cookieStore = new BasicCookieStore(); 

    public WebRequest(String url) { 
    _url = url; 
    _postType = PostType.POST; 
    } 

    public WebRequest(String url, CookieStore cookieStore) { 
    _url = url; 
    _cookieStore = cookieStore; 
    _postType = PostType.POST; 
    } 

    public WebRequest(String url, PostType postType) { 
    _url = url; 
    _postType = postType; 
    } 

    public String Get() { 
    HttpClient httpclient = new DefaultHttpClient(); 

    try { 




     // Create local HTTP context 
     HttpContext localContext = new BasicHttpContext(); 

     // Bind custom cookie store to the local context 
     localContext.setAttribute(ClientContext.COOKIE_STORE, _cookieStore); 

     HttpResponse httpresponse; 
     if (_postType == PostType.POST) 
     { 
     HttpPost httppost = new HttpPost(_url); 
     httpresponse = httpclient.execute(httppost, localContext); 
     } 
     else 
     { 
     HttpGet httpget = new HttpGet(_url); 
     httpresponse = httpclient.execute(httpget, localContext); 
     } 

     StringBuilder responseString = inputStreamToString(httpresponse.getEntity().getContent()); 

     response = responseString.toString(); 
    } 
    catch (UnknownHostException e) { 
     e.printStackTrace(); 
    } 
    catch (Exception e) { 
     e.printStackTrace(); 
    } 
    finally { 
     // When HttpClient instance is no longer needed, 
     // shut down the connection manager to ensure 
     // immediate deallocation of all system resources 
     httpclient.getConnectionManager().shutdown(); 
    } 

    return response; 
    } 

    private StringBuilder inputStreamToString(InputStream is) throws IOException { 
    String line = ""; 
    StringBuilder total = new StringBuilder(); 

    // Wrap a BufferedReader around the InputStream 
    BufferedReader rd = new BufferedReader(new InputStreamReader(is,Charset.forName("iso-8859-9"))); 
    // Read response until the end 
    while ((line = rd.readLine()) != null) { 
     total.append(line); 
    } 

    // Return full string 
    return total; 
    } 
} 

重要:

不要忘了在第一线WebRequest.java

包com.nesim.test改变包的名称;

结果:

这些变化后,你会得到这些:

D/title: Two Walk-Off Moments Lead To Two A&T Losses 
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader. 
D/title: A&T To Play Four Against Longwood 
D/description: A&T baseball takes a break from conference play this weekend. 
D/title: Wilkerson Named MEAC Rookie of the Week 
D/description: Wilkerson was 6-for-14 for the week of April 9-15. 
D/title: Lights, Camera, Action 
D/description: A&T baseball set to play nationally televised game on ESPNU. 
D/title: Resilient Aggies Fall To USC Upstate 
D/description: Luke Tendler extends his hitting streak to 10 games. 

你解析返回这些:

D/title : Two Walk-Off Moments Lead To Two A 
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader. 
D/title : A 
D/description: A&amp;T baseball takes a break from conference play thisweekend. 
D/title : Wilkerson Named MEAC Rookie of the Week 
D/description: Wilkerson was 6-for-14 for the week of April 9-15. 
D/title : Lights, Camera, Action 
D/description: A&amp;T baseball set to play nationally televised game on ESPNU. 
D/title : Resilient Aggies Fall To USC Upstate 
D/description: Luke Tendler extends his hitting streak to 10 games.