在莱克斯正则表达式（词法分析器）

问题描述：

author = "Marjan Mernik and Viljem Zumer", 
    title = "Implementation of multiple attribute grammar inheritance in the tool LISA", 
    year = 1999 

    author = "Manfred Broy and Martin Wirsing", 
    title = "Generalized 
      Heterogeneous Algebras and 
      Partial Interpretations", 
    year = 1983 

    author = "Ikuo Nakata and Masataka Sassa", 
    title = "L-Attributed LL(1)-Grammars are 
      LR-Attributed", 
    journal = "Information Processing Letters"

，我需要赶上双引号之间家居标题。我的第一次尝试是这样的：

^(" "|\t)+"title"" "*=" "*"\"".+"\","

惹人第一个例子，而不是其他两个。另一个有多条线路，这就是问题所在。我虽然要改变的东西与\n的地方，让多条线路，如：

^(" "|\t)+"title"" "*=" "*"\""(.|\n)+"\","

但是，这并没有帮助，相反，它抓住一切。

比我虽然，“我要的是双引号之间，如果我抓到的一切，直到我找到另一个"其次,？这样我可以知道我是在标题或没有结束，无论是行数，像这样：

^(" "|\t)+"title"" "*=" "*"\""[^"\""]+","

但是，这里有一个问题...上面的例子没有它，但双引号符号（"）可以在之间在标题声明。例如：

title = "aaaaaaa \"X bbbbbb",

是的，它总是会有一个反斜杠（\）。

任何建议来解决这个正则表达式？

为什么你需要lex来做到这一点？你会有解析器吗？ – LB40 2010-03-27 00:52:22

答

经典的正则表达式在双引号匹配的字符串是：

\"([^\"]|\\.)*\"

在你的情况，你会想是这样的：

"title"\ *=\ *\"([^\"]|\\.)*\"

PS：恕我直言，你把太多你的正则表达式中有很多引号，很难阅读。

Lex不能使用空格，它需要'“”'来匹配空格。这仅仅是因为Lex的缘故，我通常不会在PHP等不同的语言（我最习惯于使用正则表达式）上做这件事。 – 2010-03-27 00:20:33

你也可以使用''''来匹配大多数lex版本的空间 – 2010-03-27 00:42:18

我相信'\'符合POSIX标准。请参阅http://www.opengroup.org/onlinepubs/009695399/utilities/lex.html，表格：lex中的转义序列。 – rz0 2010-03-27 02:21:09

答

你可以使用启动条件，以简化每个单独的模式，例如：

%x title 
%% 
"title"\ *=\ *\" { /* mark title start */ 
    BEGIN(title); 
    fputs("found title = <|", yyout); 
} 

<title>[^"\\]* { /* process title part, use ([^\"]|\\.)* to grab all at once */ 
    ECHO; 
} 

<title>\\. { /* process escapes inside title */ 
    char c = *(yytext + 1); 
    fputc(c, yyout); /* double escaped characters */ 
    fputc(c, yyout); 
} 

<title>\" { /* mark end of title */ 
    fputs("|>", yyout); 
    BEGIN(0); /* continue as usual */ 
}

要使一个可执行文件：

$ flex parse_ini.y 
$ gcc -o parse_ini lex.yy.c -lfl

运行：

$ ./parse_ini < input.txt

哪里input.txt是：

author = "Marjan\" Mernik and Viljem Zumer", 
title = "Imp\"lementation of multiple...", 
year = 1999

输出：

author = "Marjan\" Mernik and Viljem Zumer", 
found title = <|Imp""lementation of multiple...|>, 
year = 1999

它通过'<|'和'|>'. Also替换周围标题'"' '\“'`被替换 ' ”“' 内的标题。

我已经使用了太多的启动条件，这使事情变得复杂一点。另外，在一个正则表达式中捕获所有东西更容易，因为我需要将匹配传递给C函数。 – 2010-03-27 04:12:47

在莱克斯正则表达式（词法分析器）

相关推荐