解析与引入nokogiri

问题描述：

一个BlogSpot的XML文件，我有一个BlogSpot的导出的XML文件，它看起来是这样的：解析与引入nokogiri

<feed> 
<entry> 
<title> title </title> 
<content type="html"> Content </content> 
</entry> 
<entry> 
<title> title </title> 
<content type="html"> Content </content> 
</entry> 
</feed>

我怎样引入nokogiri和XPath解析???

以下是我有：

#!/usr/bin/env ruby 

require 'rubygems' 
require 'nokogiri' 


doc = Nokogiri::XML(File.open("blogspot.xml")) 

doc.xpath('//content[@type="html"]').each do |node| 
    puts node.text 
end

，但它没有给我任何东西：/

有什么建议？：/

答

您的代码适用于我。 Nokigiri的某些版本存在一些问题。

我得到：

Content 
Content

我使用的是引入nokogiri（1.4.1 x86的mswin32）

感谢nigel - 事实证明，我需要对我的xpath表达式非常具体 - 或者剔除不需要的属性：D – meilas 2010-07-20 04:02:37

答

证明，我不得不删除属性饲料

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

答

我只是偶然发现了这个问题。这个问题似乎是XML命名空间：

“原来，我不得不删除对饲料的属性”

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML命名空间复杂访问节点，因为它们提供了一种分离类似的标签。阅读Searching an HTML/XML Document的“命名空间”部分。

Nokogiri也有remove_namespaces!方法，这是一种有时处理问题的有用方法，但也有一些缺点。

解析与引入nokogiri

相关推荐