python正则表达式拆分字符串,并获取所有字不工作
问题描述:
我想要split
字符串使用regular expression
与python
并获取所有匹配的文字。python正则表达式拆分字符串,并获取所有字不工作
RE:\w+(\.?\w+)*
这需要捕捉[a-zA-Z0-9_]
只喜欢的东西。
但是当我尝试匹配和字符串得到所有的内容,它不会返回正确的结果。
代码片段:
>>> import re
>>> from pprint import pprint
>>> pattern = r"\w+(\.?\w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like [email protected]#$%^&*()-+=[]{}.,;:'"`| \(`.`)/
...
... I guess that's it."""
>>> pprint(re.findall(r"\w+(.?\w+)*", string))
[' etc', ' well', ' same', ' wait', ' like', ' it']
它只是将一些单词,但实际上它应该返回所有的文字,数字和下划线(S)作为连接例子。
Python版本:的Python 3.6.2(默认情况下,2017年7月17日,16时44分45秒)
感谢。
答
您需要使用非 -capturing组(见here为什么)和逃逸点(见here什么字符应该在正则表达式进行转义):
>>> import re
>>> from pprint import pprint
>>> pattern = r"\w+(?:\.?\w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like [email protected]#$%^&*()-+=[]{}.,;:'"`| \(`.`)/
...
... I guess that's it."""
>>> pprint(re.findall(pattern, string, re.A))
['this', 'is', 'some', 'test', 'string', 'and', 'there', 'are', 'some', 'digits', 'as', 'well', 'that', 'need', 'to', 'be', 'captured', 'as', 'well', 'like', '1234567890', 'and', '321', 'etc', 'But', 'it', 'should', 'also', 'select', '_', 'as', 'well', 'I', 'm', 'pretty', 'sure', 'that', 'that', 'RE', 'does', 'exactly', 'the', 'same', 'Oh', 'wait', 'it', 'also', 'need', 'to', 'filter', 'out', 'the', 'symbols', 'like', 'I', 'guess', 'that', 's', 'it']
而且,只匹配ASCII字母,数字和_
您必须通过re.A
标志。
请参阅Python demo。
+0
谢谢,你是真正的英雄。 – Mubin
使用're.findall(r“\ w +(?:\。?\ w +)*”,string)'。如果您只需要ASCII,请传递're.A'标志,以便'\ w'只匹配ASCII字母和数字。见[demo](https://ideone.com/2sLrjV)。如果你只需要匹配字母,用'[^ \ W \ d_]'替换'\ w'。请注意,您在开始时写的内容与您在代码中使用的内容不同。 –
太好了,谢谢。我用'java'使用了相同的re('\ w +(。?\ w +)*'),并且它工作正常,请指出差异,那将会很棒。 – Mubin
那么,你必须避开这个点,并使用一个非捕获组。你不需要外部的捕获括号。 –