正则表达式：在字典中找到相同的话在一条线上

问题描述：

house house$casa | casa, vivienda, hogar | edificio, casa | vivienda

$符号翻译的长期分离。

我想通过一个带有文本编辑器的正则表达式（例如Sublimetext，Notepad ++等）在同一行找到几次字典单词，我不想要一个php函数，因为我必须检查如果我必须删除那些重复的单词，请手动。在上面的例子中，正则表达式应该找到house，casa和vivienda。我的目标是获得以下结果：使用下面的表达式

house$casa | vivienda, hogar | edificio

我都试过，但它不能正常工作：

(\b\w+\b)\W+\1

您将无法单独与正则表达式来做到这一点。习惯于用编程语言来实现它。 – Tomalak 2013-05-07 09:09:29

答

FWIW，这里是如何做到这一点粗例子在Python：

import re 

def distinct_words(block, seen, delim): 
    """ makes a list of words distinct, given a set of words seen earlier """ 

    unique_words = [] 

    for word in re.split(delim, block): 
     if not word in seen: 
      seen[word] = True 
      unique_words.append(word) 

    return unique_words 

def process_line(line): 
    """ removes all duplicate words from a dictionary line """ 

    # safeguard 
    if '$' not in line: return line 

    # split line at the '$' 
    original, translated = line.split('$') 

    # make original words distinct 
    distinct_original = distinct_words(original, {}, r' +') 

    # make translated words distinct, but keep block structure 

    # split the translated part at '|' into blocks 
    # split each block at ', ' into words 
    seen = {} 
    distinct_translated = [ 
     distinct_list for distinct_list in (
      distinct_words(block, seen, r', +') for block in (
       re.split(r'\s*\|\s*', translated) 
      ) 
     ) 
     if len(distinct_list) > 0 
    ] 

    # put everything back together again 
    part_original = ' '.join(distinct_original) 
    part_translated = [', '.join(block) for block in distinct_translated] 
    part_translated = ' | '.join(part_translated) 
    result = part_original + '$' + part_translated 

    return result 

def process_dictionary(filename): 
    """ processes a dictionary text file, modifies the file in place """ 

    lines = open(filename,'r').readlines()  
    lines_out = [process_line(line) for line in lines] 
    contents_out = '\n'.join(lines_out) 
    open(filename,'w').write(contents_out)

显然，你会打电话process_dictionary()，像这样：

process_dictionary('dict_en_es.txt')

但对于例如起见，假设你有一个单行：

line = "house house$casa | casa, vivienda, hogar | edificio, casa | vivienda" 
line_out = process_line(line) 
print line_out

打印出想要的结果：

 
house$casa | vivienda, hogar | edificio

正则表达式：在字典中找到相同的话在一条线上

相关推荐