虽然在罗马获得内容解析RSS源未在序言
问题描述:
允许使用罗马API来解析供稿我收到此错误的RSS:虽然在罗马获得内容解析RSS源未在序言
com.sun.syndication.io.ParsingFeedException: Invalid XML
at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210)
的代码如下:
public static void main(String[] args) {
URL url;
XmlReader reader = null;
SyndFeed feed;
try {
url = new URL("https://www.democracynow.org/podcast.xml");
reader = new XmlReader(url);
feed = new SyndFeedInput().build(reader);
for (Iterator<SyndEntry> i =feed.getEntries().iterator(); i.hasNext();) {
SyndEntry entry = i.next();
System.out.println(entry.getPublishedDate()+" Title "+entry.getTitle());
}
}
catch (Exception e) {
e.printStackTrace();
}
}
我查了一些环节,如:
凡proble米大概是字符集,但我想不出一个办法来实现这一点。 任何帮助或指导将非常感激。
感谢和问候,
Vaibhav的哥斯瓦米
答
我使用的整合以及和我能够得到出版日期和标题。
我的代码如下:
URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));
for (Iterator i = feed.getEntries().iterator(); i.hasNext();)
{
SyndEntry entry = (SyndEntry) i.next();
System.out.println("title |"+entry.getTitle()+" " -timeStamp "+entry.getPublishedDate()"\n")
}
这工作,我已经使用彭博网址只是导致它给了我一个XML。
如果您的查询是别的东西,不要让我知道:)
答
可以使用SyndFeed和SyndEntry用于解析XML
你也需要检查XML是否为有效一个
URL url = new URL("http://feeds.feedburner.com/javatipsfeed");
XmlReader reader = null;
try {
reader = new XmlReader(url);
SyndFeed feeder = new SyndFeedInput().build(reader);
System.out.println("Feed Title: "+ feeder.getAuthor());
for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) {
SyndEntry syndEntry = (SyndEntry) i.next();
System.out.println(syndEntry.getTitle());
}
} finally {
if (reader != null)
reader.close();
}
答
这是由于Byte Order Mark problem。这里是一个演示该问题及更正JUnit测试案例:我试图通过实施饲料雅加达解析器我的功能
package rss;
import org.xml.sax.InputSource;
import java.io.*;
import java.net.*;
import com.sun.syndication.io.*;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.BOMInputStream;
import org.junit.Test;
public class RssEncodingTest {
String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000";
// This works because we use InputSource direct from the UrlConnection's InputStream
@Test
public void test01() throws MalformedURLException, IOException,
IllegalArgumentException, FeedException {
try (InputStream is = new URL(url).openConnection().getInputStream()) {
InputSource source = new InputSource(is);
System.out.println("description: "
+ new SyndFeedInput().build(source).getDescription());
}
}
// But a String input fails because the byte order mark problem
@Test
public void test02() throws MalformedURLException, IOException,
IllegalArgumentException, FeedException {
String html = IOUtils.toString(new URL(url).openConnection()
.getInputStream());
Reader reader = new StringReader(html);
System.out.println("description: "
+ new SyndFeedInput().build(reader).getDescription());
}
// We can use Apache Commons IO to fix the byte order mark
@Test
public void test03() throws MalformedURLException, IOException,
IllegalArgumentException, FeedException {
String html = IOUtils.toString(new URL(url).openConnection()
.getInputStream());
try (BOMInputStream bomIn = new BOMInputStream(
IOUtils.toInputStream(html))) {
String f = IOUtils.toString(bomIn);
Reader reader = new StringReader(f);
System.out.println("description: "
+ new SyndFeedInput().build(reader).getDescription());
}
}
}
并可以解析此URL。与RSS相比,我认为雅加达Feed解析器可以处理更多类型的Feed。 – vaibhav