提取内容属性的内容？

问题描述：

我的第一个问题，在这里，将是真棒找到答案。我是使用nokogiri的新手。提取内容属性的内容？

这里是我的问题。我有这样的事情在HTML头对目标网站（这里是TechCrunch的帖子）：

<meta content="During my time at TechCrunch I've seen thousands of startups and written about hundreds of them. I sure as hell don't know all ..." name="description"/>

我现在想有一个脚本通过元标记运行，找到一个名为属性“描述“并获取内容属性中的内容。

我已经试过这样的事情

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://www.techcrunch.com/2009/10/11/the-underutilized-power-of-the-video-demo-to-explain-what-the-hell-you-actually-do/" 
doc = Nokogiri::HTML(open(url)) 
posts = doc.xpath("//meta") 
posts.each do |link| 
    a = link.attributes['name'] 
    b = link.attributes['content'] 
end

后，我可以选择其中属性名称等于说明中的链接 - 但是这个代码返回nil a和b。

我玩过 posts = doc.xpath("//meta"),posts = doc.xpath("//meta/*")等，但仍然无。

问题不在于xpath，因为它似乎没有解析文档。你可以用'puts doc'来检查它，它不包含完整的输入。 – akuhn 2010-01-05 01:43:35

答

的问题是不是与XPath的，因为它似乎该文件不解析。您可以检查与puts doc，它不包含完整的输入。这似乎是解析注释时出现问题的原因（我怀疑无效的HTML或libxml2中的错误）。

在你的情况我会使用一个正则表达式作为解决方法。鉴于<meta>标签是非常简单，可能的工作，如/<meta name="([^"]*)" content="([^"]*)"/

答

你应该改变

doc = Nokogiri::HTML(open(url))

到

doc = Nokogiri::HTML(open(url).read)

更新：或许不是:)其实你的代码工作对我来说，使用红宝石1.8.7/nokogiri 1.4.0

提取内容属性的内容？

相关推荐