第一章:文本-re:正则表达式-模式语法(4)
1.3.4.4 锚定
除了描述要匹配的模式的内容之外,还可以使用锚定指定指定模式在输入文本中的相对位置。表1-2列出了合法的锚定码。
表1-2 正则表达式锚定码
锚定码 | 含 义 |
---|---|
^ | 字符串或行的开头 |
$ | 字符串或行末尾 |
\A | 字符串开头 |
\Z | 字符串末尾 |
\b | 单词开头或末尾的空串 |
\B | 不在单词开头或末尾的空串 |
# re_test_patterns.py
import re
def test_patterns(text,patterns):
"""Given source text and a list of patterns,look for
matches for each pattern within the text and print
them to stdout.
"""
# Look for each pattern in the text and print the results.
for pattern,desc in patterns:
print("'{}' ({})\n".format(pattern,desc))
print(" '{}'".format(text))
for match in re.finditer(pattern,text):
s = match.start()
e = match.end()
substr = text[s:e]
n_backslashes = text[:s].count('\\')
prefix = '.' * (s + n_backslashes)
print(" {}'{}'".format(prefix,substr))
print()
return
if __name__ == '__main__':
test_patterns('abbaaabbbbaaaaa',[('ab',"'a' followed by 'b'")])
from re_test_patterns import test_patterns
test_patterns(
'This is some text -- with punctuation.',
[(r'^\w+','word at start of string'),
(r'\A\w+','word at start of string'),
(r'\w+\S*$','word near end of string'),
(r'\w+\S*\Z','word near ned of string'),
(r'w*t\w*','word containing t'),
(r'\bt\w+','t at start of word'),
(r'\w+t\b','t at end of word'),
(r'\Bt\B','t,not start or end of word')
],
)
这个例子中,匹配字符串开头和末尾单词的模式是不同的,因为字符串末尾的单词后面有结束句子的标点符号。模式\w+$不能匹配,因为.不能被认为是一个字母数字字符。
运行结果: