正则表达式（re模块）

常量
多重选项
方法

编译
单次匹配

match
search
fullmatch

match举例
search举例
fullmatch举例

全文搜索

findall
finditer

findall举例
finditer举例

匹配替换

sub
subn

sub举例
sunb举例

分割字符串

分组

group()

分组
命名分组

Python使用re模块提供了正则表达式处理的能力

常量

Python-正则表达式（re模块）

多重选项

使用 | 位或运算开启多重选项
如果单独存在，设计为1-9 如果多个存在，设置为1,2,4,8,16…

impoet re

re.M | re.S

方法

编译

re.compile(pattern, flags=0)

设定flags，编译模式，返回正则表达式对象regex
pattern就是正则表达式字符串，flags是选项，指代当前工作模式。正则表达式需要被编译，为了提高效率，这些编译后的结果被保存，下次使用同样的pattern的时候，就不需要再次编译
re的其他方法为了提高效率都调用了编译方法，就是为了提速

# 先编译在操作

import re

s = 'apple\nbig'
regex = re.compile('^a', re.M)
r = regex.match(s)
print(s)

单次匹配

match

re.match(pattern, string, flags=0)
regex.match(string[, pos[, endpos]])

match 匹配，只从开头匹配
regex对象match方法可以重设定开始位置和结束位置。返回match对象

search

re.search(pattern, string, flags=0)
regex.search(string[, pos[, endpos]])

从头搜索直到第一个匹配
regex对象search方法可以重设定开始位置和结束位置，返回match对象

fullmatch

re.fullmatch(pattern, string, flags=0)
regex.fullmatch(string[, pos[, endpos]])

全长完全匹配，整个字符串和正则表达式匹配

match举例

import re

s = """python\nhello\nwho"""
r = re.match('p', s)
print(type(r), r) # match 对象，出一个结果

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(0, 1), match='p'>

# 设置模式

import re

s = """python\nhello\nwho"""
r = re.match('^h', s, re.M)
print(type(r), r) # match 对象，出一个结果

# 打印结果
 <class 'NoneType'> None
# None原因：虽然re.M为多行模式，但是match只从头开始找

# 先编译，在匹配，设置开始位置
import re

s = """python\nhello\nwho"""
regex = re.compile('t', re.M)
r = regex.match(s, 2) # 把索引2作为开始找
print(type(r), r)

#打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(2, 3), match='t'>
# 只有先编译，在match，才可以调整开始位置

总结：mauch不管单行多行，只从头或指定开始索引找

search举例

import re

s = """python\nhello\nwho"""
#regex = re.compile('h', re.M)
r = re.search('h',s) # 找到python的h就停止匹配了
print(type(r), r)

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(3, 4), match='h'>

# 设置模式

import re

s = """python\nhello\nwho"""
#regex = re.compile('h', re.M)
r = re.search('e',s, re.M)
print(type(r), r)

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(8, 9), match='e'>

# 先编译在匹配，设置开始位置

import re

s = """python\nhello\nwho"""
regex = re.compile('h', re.M) # 设置为多行模式
r = regex.search(s,4,9) # 从索引4开始，到索引8结束，[4,9)
print(type(r), r)

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(7, 8), match='h'>

总结： search不管是不是多行，找到就返回

fullmatch举例

import re

s = """python\nhello\nwho"""
regex = re.compile('.+', re.S) # .+  在单行模式下, . 点可匹配到换行符
r = regex.search(s)
print(type(r), r) # match='python\nhello\nwho'

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(0, 16), match='python\nhello\nwho'>


import re

s = """python\nhello\nwho"""
regex = re.compile('\w+') # 先编译
r = regex.search(s, 1, 3) # 匹配 [1,3)
print(type(r), r) #  match =yt

# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(1, 3), match='yt'>

总结：fullmatch不管单行多行模式情况下整个字符串（或指定区间）需与正则表达式匹配

全文搜索

findall

re.findall(pattern,string,flags=0)
regex.findall(string[, pos[, endpos]])

对整个字符串，从左至右匹配，返回所有匹配项的列表，里面是str

finditer

re.finditer(pattern,string,flags=0)
regex.finditer(string[, pos[, endpos]])

对整个字符串，从左至右匹配，返回所有匹配项，返回迭代器
注意每次迭代返回的是match对象

findall举例

import re

s = """python\nhello\nwho"""
r= re.findall('h',s)
print(r)

# 打印结果
['h', 'h', 'h']

# 按区间匹配，需先编译

import re

s = """python\nhello\nwho"""
regex = re.compile('h')
r= regex.findall(s, 3,10) # 匹配 [3,10)
print(r)

# 打印结果
['h', 'h']

finditer举例


import re

s = """python\nhello\nwho"""
r= re.finditer('h',s)
print(r) # <callable_iterator object at 0x00000000021E09B0>
for i in r:
    print(type(i),i,s[i.start():i.end()]) # s[i.start():i.end()] 切片拿到match值
 
 # 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(3, 4), match='h'> h
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(7, 8), match='h'> h
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(14, 15), match='h'> h

匹配替换

sub

re.sub(pattern, replacement, string, count=0, flags=0)
regex.sub(replacement, string, count=0)

使用pattern对字符串string进行匹配，对匹配项使用replacement替换，返回的是str
replacement可是是string、bytes、function

subn

re.subn(pattern, replacement, string, count=0, flags=0)
regex.subn(replacement, string, count=0)

同sub返回一个元祖(new_string, number_of_subs_made)

sub举例

Python-正则表达式（re模块）

import re

s = """python\nhello\nwho"""
r= re.sub('h', 'ab', s)
print(type(r), r)

# 打印结果
<class 'str'> pytabon
abello
wabo

# 指定替换次数
import re

s = """python\nhello\nwho"""
r= re.sub('h', 'ab', s, 1) # 替换1次
print(type(r), r)

# 打印结果 # 仅python被替换为pytabon
<class 'str'> pytabon
hello
who

# 先编译后替换，指定替换次数
import re

s = """python\nhello\nwho"""
regex = re.compile('h')
r= regex.sub('ab', s, 2) # 替换2次
print(type(r), r)

# 打印结果 # 仅python和hello被替换
<class 'str'> pytabon
abello
who

引用分组，添加后缀或前缀

# 添加前缀
import re

s = """honey\nhello\nhi"""
r= re.sub('(h\w+)', r'python-----\1', s)
print(r)

# 打印结果
python-----honey
python-----hello
python-----hi

# 添加后缀
r= re.sub('(h\w+)', r'\1------python', s)
print(r)

# 打印结果
honey------python
hello------python
hi------python

sunb举例

import re

s = """honey\nhello\nhi"""
r= re.subn('h', 'p', s)
print(type(r), r)

# 打印结果
<class 'tuple'> ('poney\npello\npi', 3)

for i in r:
    print(i)
   
# 打印结果
poney
pello
pi
3

分割字符串

re.split(pattern, string, maxsplit=0, flags=0)

re.split 分割字符串

import re

s = """
os.path.abspath(path)
normpath(join(os.getcwd(), path))
"""
# 把每行单词提取出来
print(s.split())  # 做不到
# 打印结果
['os.path.abspath(path)', 'normpath(join(os.getcwd(),', 'path))']

print(re.split('[\.()\s,]+', s))
# 打印结果
['', 'os', 'path', 'abspath', 'path', 'normpath', 'join', 'os', 'getcwd', 'path', '']

分组

使用小括号的pattern捕获的数据被放到了组group中
match、search函数可以返回match对象
findall返回字符穿列表；finditer返回一个个match对象

group()

如果pattern中使用了分组，如果有匹配的结果，会在match对象中

使用group(N)方式返回对应分组，1到N是对应的分组，0返回整个匹配的字符串，N不写缺省为0
如果使用了命名分组，可以使用group(‘name’)的方式取分组
也可以使用groups()返回所有组
使用groupdict()返回所有命名的分组

分组

import re

s = '''bottle\nbag\nbig\napple'''

regex = re.compile('(b\w+)') # 先编译
result = regex.match(s) # 从头开始匹配一次
print(type(result)) # <class '_sre.SRE_Match'>
print(result.group()) # bottle

命名分组

分组命名从1开始，0代表整个match对象

import re

s = '''bottle\nbag\nbig\napple'''

regex = re.compile('(b\w+)\n(?P<name2>b\w+)\n(?P<name3>b\w+)')
result = regex.match(s)

print(result) # type(result)返回的是 <class '_sre.SRE_Match'>
# 打印结果 <_sre.SRE_Match object; span=(0, 14), match='bottle\nbag\nbig'>

print(result.group(1),result.group(2),result.group(3))  # 通过分组索引取对应分组值
# 打印结果 bottle bag big

print(result.group('name2'),result.group('name3')) # 通过命名分组名称取对应分组值
# 打印结果 bag big

print(result.groupdict()) # 将命名分组组成kv对放入字典
# 打印结果 {'name2': 'bag', 'name3': 'big'}

print(result.group(0)) # 等效result.group() 
# 打印结果
bottle
bag
big

findall用法

import re

s = '''bottle\nbag\nbig\napple'''

regex = re.compile('(b\w+)\n(?P<name2>b\w+)\n(?P<name3>b\w+)')
result = regex.findall(s)
for x in result:
    print(type(x),x)
 
 # 打印结果
 <class 'tuple'> ('bottle', 'bag', 'big')

finditer用法

import re

s = '''bottle\nbag\nbig\napple'''

regex = re.compile('(?P<head>b\w+)')
result = regex.finditer(s)
for x in result:
    print(type(x), x, x.group(), x.group('head'))
 
# 打印结果
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(0, 6), match='bottle'> bottle bottle
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(7, 10), match='bag'> bag bag
<class '_sre.SRE_Match'> <_sre.SRE_Match object; span=(11, 14), match='big'> big big

Python-正则表达式（re模块）

正则表达式（re模块）

常量

多重选项

方法

编译

单次匹配

match

search

fullmatch

match举例

search举例

fullmatch举例

全文搜索

findall

finditer

findall举例

finditer举例

匹配替换

sub

subn

sub举例

sunb举例

分割字符串

分组

group()

分组

命名分组

相关推荐