XSLT选择和变换节点(与正则表达式匹配)和下面的兄弟姐妹,直到下一次类似的节点

问题描述:

有些简化,我的XML看起来是这样的:XSLT选择和变换节点(与正则表达式匹配)和下面的兄弟姐妹,直到下一次类似的节点

<?xml version="1.0" encoding="UTF-8"?> 
<dict> 
    <entry> 
     <form>word</form> 
     <gram>noun</gram> 
     <span style="bold">1.</span> 
     <def>this is a definition in the first sense.</def> – <cit type="example"> 
      <quote>This is a <span style="bold">quote</span> for the first sense. </quote> 
     </cit> 
     <span style="bold">2.</span> 
     <def>This is a definition for the second sense</def> – <cit type="example"> 
      <quote>This is a quote for the second sense.</quote> 
     </cit> 
    </entry>  
</dict> 

我需要这个使用XSLT 2.0或3.0得到改造以下:

<?xml version="1.0" encoding="UTF-8"?> 
<dict> 
    <entry> 
     <form>word</form> 
     <gram>noun</gram> 
     <sense n="1"> 
      <def>this is a definition in the first sense.</def> – <cit type="example"> 
       <quote>This is a <span style="bold">quote</span> for the first sense. </quote> 
      </cit> 
     </sense> 
     <sense n="2"> 
      <def>This is a definition for the second sense</def> – <cit type="example"> 
       <quote>This is a quote for the second sense.</quote> 
      </cit> 
     </sense> 
    </entry> 
</dict> 

Тhere可以两个以上的感官,跨度风格大胆的可以发生在其他地方,所以我们需要找出具体像tei:span[@style='bold'][matches(text(), '^\d\.')]这一点。

我很难把它放在一个样式表中,该样式表也会提取跨度文本节点的编号,并将其用作新元素<sense>的属性值。

我会非常感谢您的tips.x

+0

你可以扩展你的例子包括案件其中“span style bold可以在其他地方出现”以显示应该如何处理(假设你不需要'sense'元素)?谢谢! –

+0

我刚刚做了 - span style粗体用于使某些单词粗体,但如果它们用作感觉分隔符,它们总是只包含一个数字和句点。 – Tench

这里是一个XSLT 3.0样品

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"> 

    <xsl:mode on-no-match="shallow-copy"/> 

    <xsl:output indent="yes"/> 

    <xsl:template match="entry"> 
     <xsl:copy> 
      <xsl:for-each-group select="node()" group-starting-with="span[@style = 'bold'][matches(., '^[0-9]+\.$')]"> 
       <xsl:choose> 
        <xsl:when test="self::span[@style = 'bold'][matches(., '^[0-9]+\.$')]"> 
         <sense nr="{replace(., '[^0-9]+', '')}"> 
          <xsl:apply-templates select="current-group() except ."/> 
         </sense> 
        </xsl:when> 
        <xsl:otherwise> 
         <xsl:apply-templates select="current-group()"/> 
        </xsl:otherwise> 
       </xsl:choose> 
      </xsl:for-each-group> 
     </xsl:copy> 
    </xsl:template> 

</xsl:stylesheet> 

产生输出

<?xml version="1.0" encoding="UTF-8"?> 
<dict> 
    <entry> 
     <form>word</form> 
     <gram>noun</gram> 
     <sense nr="1"> 
     <def>this is a definition in the first sense.</def> – <cit type="example"> 
      <quote>This is a <span style="bold">quote</span> for the first sense. </quote> 
     </cit> 
     </sense> 
     <sense nr="2"> 
     <def>This is a definition for the second sense</def> – <cit type="example"> 
      <quote>This is a quote for the second sense.</quote> 
     </cit> 
    </sense> 
    </entry>  
</dict> 
+0

谢谢,马丁!这很棒。 – Tench