Python - re.sub无需替换正则表达式的一部分

问题描述：

因此，例如，我有一个字符串“完美的熊寻宝”，我想用“the”之前的单词替换“bear”之前的单词。Python - re.sub无需替换正则表达式的一部分

所以生成的字符串将是“熊狩猎”

我想我会用

re.sub("\w+ bear","the","perfect bear hunts")

，但它取代“熊”了。我如何排除熊被替换，同时也用于匹配？

@Rawing非常好，编辑它 – Gillian

答

像其他答案一样，我会使用积极的lookahead断言。

然后，为了解决拉夫在几个评论中提出的问题（关于“胡子”这样的词怎么样？），我会添加(\b|$)。这匹配一个字边界或字符串的结尾，所以你只匹配单词bear，而不再是。

所以你会得到如下：

import re 

def bear_replace(string): 
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

和测试用例（使用pytest）：

import pytest 

@pytest.mark.parametrize('string, expected', [ 
    ("perfect bear swims", "the bear swims"), 

    # We only capture the first word before 'bear 
    ("before perfect bear swims", "before the bear swims"), 

    # 'beard' isn't captured 
    ("a perfect beard", "a perfect beard"), 

    # We handle the case where 'bear' is the end of the string 
    ("perfect bear", "the bear"), 

    # 'bear' is followed by a non-space punctuation character 
    ("perfect bear-string", "the bear-string"), 
]) 
def test_bear_replace(string, expected): 
    assert bear_replace(string) == expected

对不起，我很挑剔，但我想指出，如果“熊”一词后面跟着任何标点符号 - “熊”，熊（\ s | $）'不匹配。或者“熊，谁”等。我建议使用单词边界'\ b'来代替（尽管承认这不是一个完美的解决方案;例如它会匹配“熊大小”）。 –

@Rawing Nitpicky很好！固定。 – alexwlchan

答

Look Behind and Look Ahead正则表达式就是你要找的。

re.sub(".+(?=bear)", "the ", "prefect bear swims")

这将替换所有的一切人物 “熊” 之前。试试这个“我的长胡子”。 –

这将产生'thebear swims' – Igle

答

使用正先行熊之前更换的一切：

re.sub(".+(?=bear)","the ","perfect bear swims")

.+将捕捉任何字符（除行终止）。

这将逐字地替换字符“熊”之前的所有内容，而不仅仅是前面的单词。试试这个“我的长胡子”看到问题... –

用空格更新。感谢提示;） – Igle

它仍然将“大熊”变成“熊”而不是“熊”。 OP表示他们希望在“熊”之前替换_字，而不是整个字符串。你去完全改变了OP的'\ w +'，绝对没有任何理由。 –

答

替代使用向前看符号：

捕捉你想用一组()，以保持和更换使用\1重新插入的部分。

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

请注意，这也会匹配“胡子”等字样。你应该考虑添加一个字边界'\ b'。 –

Python - re.sub无需替换正则表达式的一部分

相关推荐