试图解析用python编写的RSS阅读器的提要

问题描述：

我仍然是一名python初学者。作为一个实践项目，我想编写我自己的RSS阅读器。我在这里找到了一个有用的教程：learning python。我使用的教程中提供的代码：试图解析用python编写的RSS阅读器的提要

#! /usr/bin/env python  
import urllib2 
from xml.dom import minidom, Node 

""" Get the XML """ 
url_info = urllib2.urlopen('http://rss.slashdot.org/Slashdot/slashdot') 

if (url_info): 
    """ We have the RSS XML lets try to parse it up """ 
    xmldoc = minidom.parse(url_info) 
    if (xmldoc): 
     """We have the Doc, get the root node""" 
     rootNode = xmldoc.documentElement 
     """ Iterate the child nodes """ 
     for node in rootNode.childNodes: 
      """ We only care about "item" entries""" 
      if (node.nodeName == "item"): 
       """ Now iterate through all of the <item>'s children """ 
       for item_node in node.childNodes: 
        if (item_node.nodeName == "title"): 
         """ Loop through the title Text nodes to get 
         the actual title""" 
         title = "" 
         for text_node in item_node.childNodes: 
          if (text_node.nodeType == node.TEXT_NODE): 
           title += text_node.nodeValue 
         """ Now print the title if we have one """ 
         if (len(title)>0): 
          print title 

        if (item_node.nodeName == "description"): 
         """ Loop through the description Text nodes to get 
         the actual description""" 
         description = "" 
         for text_node in item_node.childNodes: 
          if (text_node.nodeType == node.TEXT_NODE): 
           description += text_node.nodeValue 
         """ Now print the title if we have one. 
         Add a blank with \n so that it looks better """ 
         if (len(description)>0): 
          print description + "\n" 
    else: 
     print "Error getting XML document!" 
else: 
    print "Error! Getting URL"<code>

一切都按预期工作，我首先想到了解它的一切。但是，当我使用另一个RSS源（例如“http://www.spiegel.de/schlagzeilen/tops/index.rss”）时，我从Eclipse IDE获得了我的应用程序的“终止”错误。该错误消息，因为我不知道究竟在哪里和为什么应用程序终止。调试器没有什么帮助，因为它忽略了我的断点。那么，这是另一个问题。

有人知道我在做什么错？

你可以尝试做一个二进制搜索（通过注释代码）来隔离问题吗？ –

我试过了。我刚刚知道编译器不是错误消息，但我缺乏知识。 – jacib

答

好了“终止”的消息是不是一个错误，它只是信息Python有没有错误退出。

你没有做错什么，只是这个RSS阅读器不是很灵活，因为它只知道RSS的一个变种。

如果你比较Slashdot和明镜在线的XML的文档，你看到的文档结构的差异：

Slashdot的：

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ...> 
    <channel rdf:about="http://slashdot.org/"> 
    <title>Slashdot</title> 
    <!-- more stuff (but no <item>-tags) --> 
    </channel> 
    <item rdf:about="blabla"> 
    <title>The Condescending UI</title> 
    <!-- item data --> 
    </item> 
    <!-- more <item>-tags --> 
</rdf:RDF>

明镜在线：

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> 
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"> 
    <channel> 
    <title>SPIEGEL ONLINE - Schlagzeilen</title> 
    <link>http://www.spiegel.de</link> 
    <item> 
     <title>Streit über EU-Veto: Vize Clegg meutert gegen britischen Premier Cameron</title> 
    </item> 
    <!-- more <item>-tags --> 
    <channel> 
</rss>

在Spiegel Online的所有<item>元素都在<channel>-tag中，但在slashdot feed中，它们在ro ot -tag（<rdf:RDF>）。而你的Python代码只会在根目录下标记 -tag。

如果你希望你的RSS阅读器为两种物料的工作，例如，您可以更改以下行：

for node in rootNode.childNodes:

要的是：

for node in rootNode.getElementsByTagName('item'):

随着所有<item>标签都有效列举，而不管它们在XML文档中的位置。

感谢您的提示，现在它的作品。必须承认我的XML知识是低于标准的;） – jacib

答

如果没有发生，也许一切是正确的在你的代码，你就是不正确的元素:)

如果你有一个例外，试图从从命令行启动匹配：

python <yourfilename.py>

或者使用try/catch来捕获异常，并打印错误：

try: 
    # your code 
catch Exception, e: 
    # print it 
    print 'My exception is', e

你是对的代码是正确的，但我的逻辑不是...... – jacib

试图解析用python编写的RSS阅读器的提要

相关推荐