无法从R中的句子中提取精确的短语
问题描述:
我试图从R中的句子中提取精确的短语。它也提取了其部分匹配的句子。例如:无法从R中的句子中提取精确的短语
phrase <- c("r is not working","roster is not working")
sentence <- c("ABC is not working and roster is not working","CDE is working but printer is not working")
extract <- sapply(phrase, grepl, x = sentence)
extract
它使输出为:
r is not working roster is not working
[1,] TRUE TRUE
[2,] TRUE FALSE
我所需的输出是:
r is not working roster is not working
[1,] FALSE TRUE
[2,] FALSE FALSE
短语 “R不工作” 不应该匹配两个句子。有什么办法可以解决这个问题吗?有什么想法吗?谢谢!!
答
grepl
评估正则表达式。
如果你想坚持的,锚您的搜索模式以字符串的开始和结束:
phrase <- c("^r is not working$", "^roster is not working$")
如果你不是要检查精确匹配,简单地使用
extract <- sapply(sentence, `%in%`, phrase)
可能会添加字边界,如'sapply(paste0(“\\ b”,短语,“\\ b”),grepl,x =句子)' –
“r不工作”匹配两个字符串,但添加一个空格在r:“r不工作”之前将阻止匹配。 – Dave2e