XSLT选择和变换节点(与正则表达式匹配)和下面的兄弟姐妹,直到下一次类似的节点
问题描述:
有些简化,我的XML看起来是这样的:XSLT选择和变换节点(与正则表达式匹配)和下面的兄弟姐妹,直到下一次类似的节点
<?xml version="1.0" encoding="UTF-8"?>
<dict>
<entry>
<form>word</form>
<gram>noun</gram>
<span style="bold">1.</span>
<def>this is a definition in the first sense.</def> – <cit type="example">
<quote>This is a <span style="bold">quote</span> for the first sense. </quote>
</cit>
<span style="bold">2.</span>
<def>This is a definition for the second sense</def> – <cit type="example">
<quote>This is a quote for the second sense.</quote>
</cit>
</entry>
</dict>
我需要这个使用XSLT 2.0或3.0得到改造以下:
<?xml version="1.0" encoding="UTF-8"?>
<dict>
<entry>
<form>word</form>
<gram>noun</gram>
<sense n="1">
<def>this is a definition in the first sense.</def> – <cit type="example">
<quote>This is a <span style="bold">quote</span> for the first sense. </quote>
</cit>
</sense>
<sense n="2">
<def>This is a definition for the second sense</def> – <cit type="example">
<quote>This is a quote for the second sense.</quote>
</cit>
</sense>
</entry>
</dict>
Тhere可以两个以上的感官,跨度风格大胆的可以发生在其他地方,所以我们需要找出具体像tei:span[@style='bold'][matches(text(), '^\d\.')]
这一点。
我很难把它放在一个样式表中,该样式表也会提取跨度文本节点的编号,并将其用作新元素<sense>
的属性值。
我会非常感谢您的tips.x
答
这里是一个XSLT 3.0样品
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="entry">
<xsl:copy>
<xsl:for-each-group select="node()" group-starting-with="span[@style = 'bold'][matches(., '^[0-9]+\.$')]">
<xsl:choose>
<xsl:when test="self::span[@style = 'bold'][matches(., '^[0-9]+\.$')]">
<sense nr="{replace(., '[^0-9]+', '')}">
<xsl:apply-templates select="current-group() except ."/>
</sense>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
产生输出
<?xml version="1.0" encoding="UTF-8"?>
<dict>
<entry>
<form>word</form>
<gram>noun</gram>
<sense nr="1">
<def>this is a definition in the first sense.</def> – <cit type="example">
<quote>This is a <span style="bold">quote</span> for the first sense. </quote>
</cit>
</sense>
<sense nr="2">
<def>This is a definition for the second sense</def> – <cit type="example">
<quote>This is a quote for the second sense.</quote>
</cit>
</sense>
</entry>
</dict>
+0
谢谢,马丁!这很棒。 – Tench
你可以扩展你的例子包括案件其中“span style bold可以在其他地方出现”以显示应该如何处理(假设你不需要'sense'元素)?谢谢! –
我刚刚做了 - span style粗体用于使某些单词粗体,但如果它们用作感觉分隔符,它们总是只包含一个数字和句点。 – Tench