nokogiri:xml to html

nokogiri:xml to html

问题描述:

我只是想做一些直接转换(几乎只是搜索和替换),但我有麻烦只是让事情坐在原地 - 我结束了链接不正确和重复的内容。我敢肯定,我在遍历XML做一些愚蠢的我尝试:nokogiri:xml to html

builder = Nokogiri::HTML::Builder.new do |doc| 
doc.html { 
    doc.body { 
    doc.div.wrapper! { 
    doc.h1 "Short" 

     xm.css('paragraph').each do |para| 

     doc.h3.para(:id => para['number']) { doc.text para['number'] } 

     doc.p.narrativeparagraph { 

      xm.css('paragraph inner-section').each do |section| 
       doc.span.innersection { doc.text section.content 

      xm.css('inner-section xref').each do |xref| 
       doc.a(:href => "#" + xref['number']) { doc.text xref['number'] } 
      end 

      xm.css('paragraph inner-text').each do |innertext| 
       doc.span.innertext { doc.text innertext.content } 
      end 

       } end #inner-section     

       } 

      end#end paragraph 
     }#end wrapper 
     }#end body 
    }#end html 
    end#end builder 

上:

<?xml version="1.0"?> 

<looseleaf> 

<paragraph number="1"> 
    <inner-section> blah one blah <xref number="link1location"></xref> 
    <inner-text> blah two blah blah </inner-text> 
    blah three 
    </inner-section> 
</paragraph> 

<paragraph number="2"> 
<inner-section> blah four blah <xref number="link2location"></xref> 
    <inner-text>blah five blah blah </inner-text> 
     blah six 
</inner-section> 
</paragraph> 

</looseleaf> 

创建:)

我,试图

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC- html40/loose.dtd"> 
<html> 
<body> 
<div id="wrapper"> 
<h1>Short</h1> 
<h3 class="para" id="1">1</h3> 
<p class="narrativeparagraph"> 
<span class="innersection"> blah one blah <a href="#link1location">link1location</a> 
<span class="innertext"> blah two blah blah </span> 
    blah three</span> 
</p> 

<h3 class="para" id="2">2</h3> 
<p class="narrativeparagraph"> 
<span class="innersection"> blah four blah <a ref="#link2location">link2location</a> 
<span class="innertext">blah five blah blah </span> 
    blah six</span></p> 

我一直在尝试各种各样的事情,试图让这个工作,基本的HTML结构出来没关系,但段落的孩子们是一团糟 - 任何帮助将非常感激。 问候, 里奇

+1

与我同一条船上的任何noobs - 我已经退出尝试使用Builder,我慢慢到达那里,但肯定是: frag.xpath(“// paragraph”)。each {| div | div.name =“p”; div.set_attribute(“class”,“narrativeparagraph”)} frag.cs​​s('inner-section xref')。each {| xref | xref.name =“a”; xref.set_attribute(“href”,“#”+ xref ['number']); xref.content = xref ['number']} – ritchielee 2009-11-28 00:56:38

+0

你给出的html作为例子 - 就是你想要构建的东西?你能提供一个实际结果的例子吗? – 2011-03-31 18:07:27

有很多方法可以做到这一点,但如果你坚持的构建方式,我会作出这样的转变<paragraph><p>功能。

builder = Nokogiri::HTML::Builder.new do |doc| 
    doc.html { 
    doc.body { 
     doc.div.wrapper! { 
     doc.h1 "Short" 
     xm.css('paragraph').each do |para| 
      doc << translate_paragraph para.dup 
     end #para 
    }#end body 
    }#end html 
end#end builder 

def translate_paragraph(p) 
    # Change '<paragraph>' to '<p>' 
    p.name = 'p' 

    # Change '<innersection>' to '<span class='innersection'>' 
    p.css('innersection').each { |tag| 
    tag.name = 'span' 
    tag['class'] = 'innersection' 
    } 

    # ... 
end 

不完美,但它适用于Builder。

我也会考虑XSLT,或者递归遍历HTML树并从那里构建。