Python - 检查字符串是否包含列表中的任何元素
问题描述:
我需要检查一个字符串是否包含列表的任何元素。我目前使用这种方法:Python - 检查字符串是否包含列表中的任何元素
engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"
print("the english sentence is: " + engSentence)
engWords2 = []
isEnglish = 0
for w in engWords:
if w in engSentence:
isEnglish = 1
engWords2.append(w)
if isEnglish == 1:
print("The sentence is english and contains the words: ")
print(engWords2)
这里的问题是,它给输出:
the english sentence is: the dogs fur is black and white
The sentence is english and contains the words:
['the', 'a', 'and', 'it']
>>>
正如你可以看到“a”和“它”不应该存在。我如何搜索,以便它只会列出单个单词,而不是单词的一部分?我愿意使用普通的Python代码或正则表达式(尽管我对Python和正则表达式都很新,所以请不要太复杂)谢谢。
答
它找到了这两个词,因为它们分别是“黑色”和“白色”的子字符串。当你将“in”应用于一个字符串时,它只是查找字符的子字符串。
尝试:
engSentenceWords = engSentence.split()
后来,
if w in engSentenceWords:
,其将原来的句子翻译成单个单词的列表,然后对整个字值检查。
答
words = set(engSentence.split()).intersection(set(engWords))
if words:
print("The sentence is english and contains the words: ")
print(words)
将engSentence拆分为列表中的标记,将其转换为集合,将engWords转换为集合并找到交集(公共重叠)。然后检查它是否非空,如果是,打印出找到的单词。
答
或者更简单,添加一个空格,以你的句子,你的搜索词:
engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"
print("the english sentence is: " + engSentence)
engWords2 = []
isEnglish = 0
engSentence += " "
for w in engWords:
if "%s " % w in engSentence:
isEnglish = 1
engWords2.append(w)
if isEnglish == 1:
print("The sentence is english and contains the words: ")
print(engWords2)
输出为:
the english sentence is: the dogs fur is black and white
The sentence is english and contains the words:
['the', 'and']
答
您可能需要使用正则表达式匹配。尝试类似以下
import re
match_list = ['foo', 'bar', 'eggs', 'lamp', 'owls']
match_str = 'owls are not what they seem'
match_regex = re.compile('^.*({1}).*$'.format('|'.join(match_list)))
if match_regex.match(match_str):
print('We have a match.')
见re
文档上python.org了解详情。
这里没有任何正则表达式 - 这只是字符串操作。正则表达式是一种针对字符串提供匹配模式的非常具体的方式,如果您正在使用它们,您将使用're'模块。 – geoelectric 2015-04-03 20:38:34
顺便说一句,值得注意的是,所有这些解决方案(包括我的)只有在没有标点符号时才有效。任何标点符号都将看起来像它旁边的单词的一部分,并使您的比较失败。 如果你开始包括标点符号,你需要一些策略来删除或忽略它。一种策略是针对整个句子字符串使用正则表达式,在每个单词的任一侧使用'\ b'来搜索。 – geoelectric 2015-04-03 20:51:10