虽然在罗马获得内容解析RSS源未在序言

问题描述:

允许使用罗马API来解析供稿我收到此错误的RSS:虽然在罗马获得内容解析RSS源未在序言

com.sun.syndication.io.ParsingFeedException: Invalid XML 
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210) 

的代码如下:

public static void main(String[] args) { 
    URL url; 
    XmlReader reader = null; 
    SyndFeed feed; 

    try { 
     url = new URL("https://www.democracynow.org/podcast.xml"); 
     reader = new XmlReader(url); 
     feed = new SyndFeedInput().build(reader); 
     for (Iterator<SyndEntry> i =feed.getEntries().iterator(); i.hasNext();) { 
      SyndEntry entry = i.next(); 
      System.out.println(entry.getPublishedDate()+" Title "+entry.getTitle()); 

     } 
    } 
    catch (Exception e) { 
     e.printStackTrace(); 
    } 
} 

我查了一些环节,如:

http://old.nabble.com/Invalid-XML:-Error-on-line-1:-Content-is-not-allowed-in-prolog.-td21258868.html

凡proble米大概是字符集,但我想不出一个办法来实现这一点。 任何帮助或指导将非常感激。

感谢和问候,

Vaibhav的哥斯瓦米

+0

并可以解析此URL。与RSS相比,我认为雅加达Feed解析器可以处理更多类型的Feed。 – vaibhav

我使用的整合以及和我能够得到出版日期和标题。

我的代码如下:

URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml"); 

SyndFeedInput input = new SyndFeedInput(); 
SyndFeed feed = input.build(new XmlReader(feedUrl)); 

for (Iterator i = feed.getEntries().iterator(); i.hasNext();) 
{ 
SyndEntry entry = (SyndEntry) i.next(); 
System.out.println("title |"+entry.getTitle()+" " -timeStamp "+entry.getPublishedDate()"\n") 
} 

这工作,我已经使用彭博网址只是导致它给了我一个XML。

如果您的查询是别的东西,不要让我知道:)

可以使用SyndFeedSyndEntry用于解析XML

你也需要检查XML是否为有效一个

URL url = new URL("http://feeds.feedburner.com/javatipsfeed"); 
    XmlReader reader = null; 
    try { 
     reader = new XmlReader(url); 
     SyndFeed feeder = new SyndFeedInput().build(reader); 
     System.out.println("Feed Title: "+ feeder.getAuthor()); 
     for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) { 
     SyndEntry syndEntry = (SyndEntry) i.next(); 
     System.out.println(syndEntry.getTitle()); 
     } 
     } finally { 
      if (reader != null) 
       reader.close(); 
     } 

这是由于Byte Order Mark problem。这里是一个演示该问题及更正JUnit测试案例:我试图通过实施饲料雅加达解析器我的功能

package rss; 

import org.xml.sax.InputSource; 

import java.io.*; 
import java.net.*; 

import com.sun.syndication.io.*; 

import org.apache.commons.io.IOUtils; 
import org.apache.commons.io.input.BOMInputStream; 
import org.junit.Test; 

public class RssEncodingTest { 

    String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000"; 

    // This works because we use InputSource direct from the UrlConnection's InputStream 

    @Test 
    public void test01() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     try (InputStream is = new URL(url).openConnection().getInputStream()) { 
      InputSource source = new InputSource(is); 
      System.out.println("description: " 
        + new SyndFeedInput().build(source).getDescription()); 
     } 
    } 

    // But a String input fails because the byte order mark problem 

    @Test 
    public void test02() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     String html = IOUtils.toString(new URL(url).openConnection() 
       .getInputStream()); 
     Reader reader = new StringReader(html); 
     System.out.println("description: " 
       + new SyndFeedInput().build(reader).getDescription()); 
    } 

    // We can use Apache Commons IO to fix the byte order mark 

    @Test 
    public void test03() throws MalformedURLException, IOException, 
      IllegalArgumentException, FeedException { 
     String html = IOUtils.toString(new URL(url).openConnection() 
       .getInputStream()); 
     try (BOMInputStream bomIn = new BOMInputStream(
       IOUtils.toInputStream(html))) { 
      String f = IOUtils.toString(bomIn); 
      Reader reader = new StringReader(f); 
      System.out.println("description: " 
        + new SyndFeedInput().build(reader).getDescription()); 
     } 
    } 

}