nokogiri:xml to html
问题描述:
我只是想做一些直接转换(几乎只是搜索和替换),但我有麻烦只是让事情坐在原地 - 我结束了链接不正确和重复的内容。我敢肯定,我在遍历XML做一些愚蠢的我尝试:nokogiri:xml to html
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.body {
doc.div.wrapper! {
doc.h1 "Short"
xm.css('paragraph').each do |para|
doc.h3.para(:id => para['number']) { doc.text para['number'] }
doc.p.narrativeparagraph {
xm.css('paragraph inner-section').each do |section|
doc.span.innersection { doc.text section.content
xm.css('inner-section xref').each do |xref|
doc.a(:href => "#" + xref['number']) { doc.text xref['number'] }
end
xm.css('paragraph inner-text').each do |innertext|
doc.span.innertext { doc.text innertext.content }
end
} end #inner-section
}
end#end paragraph
}#end wrapper
}#end body
}#end html
end#end builder
上:
<?xml version="1.0"?>
<looseleaf>
<paragraph number="1">
<inner-section> blah one blah <xref number="link1location"></xref>
<inner-text> blah two blah blah </inner-text>
blah three
</inner-section>
</paragraph>
<paragraph number="2">
<inner-section> blah four blah <xref number="link2location"></xref>
<inner-text>blah five blah blah </inner-text>
blah six
</inner-section>
</paragraph>
</looseleaf>
创建:)
我,试图
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC- html40/loose.dtd">
<html>
<body>
<div id="wrapper">
<h1>Short</h1>
<h3 class="para" id="1">1</h3>
<p class="narrativeparagraph">
<span class="innersection"> blah one blah <a href="#link1location">link1location</a>
<span class="innertext"> blah two blah blah </span>
blah three</span>
</p>
<h3 class="para" id="2">2</h3>
<p class="narrativeparagraph">
<span class="innersection"> blah four blah <a ref="#link2location">link2location</a>
<span class="innertext">blah five blah blah </span>
blah six</span></p>
我一直在尝试各种各样的事情,试图让这个工作,基本的HTML结构出来没关系,但段落的孩子们是一团糟 - 任何帮助将非常感激。 问候, 里奇
答
有很多方法可以做到这一点,但如果你坚持的构建方式,我会作出这样的转变<paragraph>
到<p>
功能。
builder = Nokogiri::HTML::Builder.new do |doc|
doc.html {
doc.body {
doc.div.wrapper! {
doc.h1 "Short"
xm.css('paragraph').each do |para|
doc << translate_paragraph para.dup
end #para
}#end body
}#end html
end#end builder
def translate_paragraph(p)
# Change '<paragraph>' to '<p>'
p.name = 'p'
# Change '<innersection>' to '<span class='innersection'>'
p.css('innersection').each { |tag|
tag.name = 'span'
tag['class'] = 'innersection'
}
# ...
end
不完美,但它适用于Builder。
我也会考虑XSLT,或者递归遍历HTML树并从那里构建。
与我同一条船上的任何noobs - 我已经退出尝试使用Builder,我慢慢到达那里,但肯定是: frag.xpath(“// paragraph”)。each {| div | div.name =“p”; div.set_attribute(“class”,“narrativeparagraph”)} frag.css('inner-section xref')。each {| xref | xref.name =“a”; xref.set_attribute(“href”,“#”+ xref ['number']); xref.content = xref ['number']} – ritchielee 2009-11-28 00:56:38
你给出的html作为例子 - 就是你想要构建的东西?你能提供一个实际结果的例子吗? – 2011-03-31 18:07:27