我如何提取使用正则表达式从字符串特定的词在python

问题描述:

有两个字符串包含文字与它们的类型的:我如何提取使用正则表达式从字符串特定的词在python

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP' 
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN' 

我想提取与/NN标签词任何词形词与/NNP/CDP标签。这是到目前为止我的代码(仍然只与/NNP标签工作):

import re 

def entityExtractPreposition(text): 
    text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text) 
    return text 

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP' 
prepo1 = entityExtractPreposition(text1) 

text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN' 
prepo2 = entityExtractPreposition(text2) 

print text1 
print prepo1 
print '' 
print text2 
print prepo2 

代码的结果至今:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP 
['at/IN Yasmin/NNP'] 

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN 
['at/IN Jl/NNP Halimun/NNP Raya/NNP'] 

正如我们所看到的第一个字符串(text1)的entityExtractPreposition仍未能获得33/CDP。如何使entityExtractPreposition工作正常与文本1中的/CDP标记或文本2中的/NNP

预期的结果是:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP 
['at/IN Yasmin/NNP 33/CDP'] 

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN 
['at/IN Jl/NNP Halimun/NNP Raya/NNP'] 

感谢

\b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b