匹配到一个非贪婪的方式重复在ANTLR

问题描述：

line : startWord (matchPhrase| 
        anyWord matchPhrase| 
        anyWord anyWord matchPhrase| 
        anyWord anyWord anyWord matchPhrase| 
        anyWord anyWord anyWord anyWord matchPhrase) 
     -> ^(TreeParent startWord anyWord* matchPhrase);

所以我想匹配的matchPhrase第一次出现，但我会允许在它之前达到一定数量的anyWord。组成matchPhrase的令牌也与anyWord匹配。

有没有更好的方法来做到这一点？

我认为它可能是由语义谓词in this answer与非贪婪选项结合成为可能：

(options {greedy=false;} : anyWord)*

，但我无法弄清楚究竟是如何做到这一点。

编辑：下面是一个例子。我想从下面的句子中提取信息：

Picture of a red flower. 

Picture of the following: A red flower.

我输入实际被标记的英语句子，和词法规则相匹配的标签，而不是单词。所以输入到ANTLR是：

NN-PICTURE Picture IN-OF of DT a JJ-COLOR red NN-FLOWER flower 

NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower

我有词法规则，这样每个标签：

WS : (' ')+ {skip();}; 
TOKEN : (~' ')+; 

nnpicture:'NN-PICTURE' TOKEN -> ^('NN-PICTURE' TOKEN); 
vbg:'VBG' TOKEN -> ^('VBG' TOKEN);

我的语法规则是这样的：

sentence : nnpicture inof matchFlower; 

matchFlower : (dtTHE|dt)? jjcolor? nnflower;

当然，但这将在第二句话中失败。所以我想通过在花比赛之前允许多达N个令牌来允许一点灵活性。我有一个匹配任何一个anyWord令牌，以及以下工作：

sentence : nnpicture inof (matchFlower | 
          anyWord matchFlower | 
          anyWord anyWord matchFlower | etc.

，但它是不是很优雅，并且不与大N.很好地工作

@BartKiers：对不起，我没有解释它是那么好 - ' matchPhrase'是'anyWord'的一个子集，所以可能会有一些单词不在'matchPhrase'之前的'matchPhrase'中，并且它们会被'anyWord'匹配。但是因为它是一个子集，所以'anyWord'匹配需要非贪婪，否则'matchPhrase'字将与'anyWord'匹配。因此，为什么我不能做'任何语言？任何单词？任何单词？ matchPhrase'。 – 2012-03-14 10:13:23

@Matt，我明白你的意思了。如果有人在我面前不这样做，我会在今天晚上回答你（我在ATM工作）。 – 2012-03-14 10:47:48

答

您可以先检查做使用syntactic predicate在matchFlower规则里面如果存在真的是dt? jjcolor? nnflower在它的令牌流中。如果这样的令牌可以看到，只需匹配它们，如果不匹配，则匹配任何令牌，并递归匹配matchFlower。这看起来像：

matchFlower 
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) 
|       . matchFlower   -> matchFlower 
;

注意，.（点）语法分析规则中确实不匹配任何字符，但任何标记。

这里有一个快速演示：

grammar T; 

options { 
    output=AST; 
} 

tokens { 
    TEXT; 
    SENTENCE; 
    FLOWER; 
} 

parse 
: sentence+ EOF -> ^(TEXT sentence+) 
; 

sentence 
: nnpicture inof matchFlower -> ^(SENTENCE nnpicture inof matchFlower) 
; 

nnpicture 
: NN_PICTURE TOKEN -> ^(NN_PICTURE TOKEN) 
; 

matchFlower 
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) 
|       . matchFlower   -> matchFlower 
; 

inof 
: IN_OF (t=IN | t=OF) -> ^(IN_OF $t) 
; 

dt 
: DT (t=THE | t=A) -> ^(DT $t) 
; 

jjcolor 
: JJ_COLOR TOKEN -> ^(JJ_COLOR TOKEN) 
; 

nnflower 
: NN_FLOWER TOKEN -> ^(NN_FLOWER TOKEN) 
; 

IN_OF  : 'IN-OF'; 
NN_FLOWER : 'NN-FLOWER'; 
DT   : 'DT'; 
A   : 'a'; 
THE  : 'the'; 
IN   : 'in'; 
OF   : 'of'; 
VBG  : 'VBG'; 
NN_PICTURE : 'NN-PICTURE'; 
JJ_COLOR : 'JJ-COLOR'; 
TOKEN  : ~' '+; 
WS   : ' '+ {skip();};

从语法生成上面会分析你输入解析器：如下

NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower

：

enter image description here

，你可以看，在树的花被省略之前的一切。如果你想守在那里这些令牌，做这样的事情：

grammar T; 

// ... 

tokens { 
    // ... 
    NOISE; 
} 

// ... 

matchFlower 
: (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) 
|       t=. matchFlower  -> ^(NOISE $t) matchFlower 
; 

// ...

导致以下AST：

enter image description here

感谢您的详细解答。这是完美的，正是我需要的。 – 2012-03-14 21:39:27

不客气，@Matt。 – 2012-03-14 21:57:24

匹配到一个非贪婪的方式重复在ANTLR

相关推荐