我如何提取使用正则表达式从字符串特定的词在python
问题描述:
有两个字符串包含文字与它们的类型的:我如何提取使用正则表达式从字符串特定的词在python
text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
我想提取与/NN
标签词任何词形词与/NNP
和/CDP
标签。这是到目前为止我的代码(仍然只与/NNP
标签工作):
import re
def entityExtractPreposition(text):
text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text)
return text
text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
prepo1 = entityExtractPreposition(text1)
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
prepo2 = entityExtractPreposition(text2)
print text1
print prepo1
print ''
print text2
print prepo2
代码的结果至今:
Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP']
Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']
正如我们所看到的第一个字符串(text1
)的entityExtractPreposition
仍未能获得33/CDP
。如何使entityExtractPreposition
工作正常与文本1中的/CDP
标记或文本2中的/NNP
?
预期的结果是:
Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP 33/CDP']
Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']
感谢
答
\b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b