pyrouge元组索引

问题描述:

我试图用pyrouge来计算自动汇总和黄金标准之间的相似性。当它处理两个摘要时,Rouge工作正常。但是当它写出结果时,它会抱怨“元组索引超出范围”有谁知道是什么原因导致了这个问题,以及我如何解决它?pyrouge元组索引

2017-09-13 23:54:57,524 [MainThread ] [INFO ] Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5. 
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Writing summaries. 
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model. 
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\automated. 
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing automated.txt. 
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system. 
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\manual. 
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing BookRags.txt. 
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSaver.txt. 
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSummary.txt. 
2017-09-13 23:54:57,557 [MainThread ] [INFO ] Processing Wikipedia.txt. 
2017-09-13 23:54:57,562 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model. 
Traceback (most recent call last): 

    File "<ipython-input-8-bc227b272111>", line 1, in <module> 
    runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research') 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile 
    execfile(filename, namespace) 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 

    File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module> 
    output = r.convert_and_evaluate() 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate 
    rouge_output = self.evaluate(system_id, rouge_args) 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate 
    self.write_config(system_id=system_id) 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config 
    self._config_file, system_id) 

    File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static 
    system_filename_pattern = re.compile(system_filename_pattern) 

    File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile 
    return _compile(pattern, flags) 

    File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile 
    p = sre_compile.compile(pattern, flags) 

    File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile 
    p = sre_parse.parse(p, flags) 

    File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse 
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) 

    File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub 
    not nested and not items)) 

    File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse 
    source.tell() - here + len(this)) 

error: nothing to repeat 

的金标准是BookRags.txt,GradeSaver.txt,GradeSummary.txt,Wikipedia.txt 需要进行比较与为automated.txt
也不应该* .txt或摘要[ a-z0-9A-Z] +工作?但前一个给我“没有重复错误”,后者“元组索引超出范围”的错误

r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5") 
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\\automated' 
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual' 
r.system_filename_pattern = '[a-z0-9A-Z]+.txt' 
r.model_filename_pattern = '[a-z0-9A-Z]+.txt' 
output = r.convert_and_evaluate() 
print(output) 

我手动设置这两个目录。看起来Rouge包可以处理它中的txt。

问题是,流氓图书馆从来没有考虑过你的正则表达式找不到匹配的情况。流氓源代码id = match.groups(0)[0]中的行是有问题的行。如果您在documentation中查看,则表示组功能Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern...。因为找不到匹配,所以返回一个空元组,并且代码尝试从空元组中获取第一项,这会导致错误。

+0

我明白了。所以我将我的正则表达式更改为* .txt,它将匹配文件夹中的任何摘要。但现在它给了我新的错误 - 无需重复。 – Nat

+0

通常,*会被视为匹配任意数量的任何字符的通配符,但使用正则表达式时*的行为会有所不同。有关更多信息,请参阅https://*.com/questions/31386552/nothing-to-repeat-from-python-regex。正如你所提到的'[a-z0-9A-Z] +'应该挑选任何东西。你可以打印出write_config_static函数使用的system_dir变量,并确保你的.txt文件在这个文件夹中,而不是在这个文件夹的子目录中。 –

+0

看起来Rouge可以在系统目录和模型目录中找到摘要,因为从它的输出中,它已经在两个目录中处理了txts。问题仍然发生在write_config_static函数中。我的system_dir和model_dir被手动设置为绝对地址。 – Nat