Python检查项目在迭代时存在于列表中

问题描述：

我试图循环两个列表，只想打印一个项目，如果它存在于第二个列表中。我将通过非常大的文件来做到这一点，所以不想像列表或字典那样将它们存储在内存中。有没有一种方法可以在不存储到列表或字典中的情况下执行此操作？Python检查项目在迭代时存在于列表中

我能够做到以下确认他们不在列表中，但不确定为什么它不工作，当我试图通过删除“不”来确认他们在列表中。

验证项目的代码在列表2中不存在。

list_1 = ['apple', 
      'pear', 
      'orange', 
      'kiwi', 
      'strawberry', 
      'banana'] 

list_2 = ['kiwi', 
      'melon', 
      'grape', 
      'pear'] 

for fruit_1 in list_1: 
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2): 
     print(fruit_1)

验证项目的代码是否存在于list_2中。

list_1 = ['apple', 
      'pear', 
      'orange', 
      'kiwi', 
      'strawberry', 
      'banana'] 

list_2 = ['kiwi', 
      'melon', 
      'grape', 
      'pear'] 

for fruit_1 in list_1: 
    if all(fruit_1 in fruit_2 for fruit_2 in list_2): 
     print(fruit_1)

难道你不能只使用列表理解？ '如果list_2中的x在列表__1中为x，则返回'list_1'中列表中的项目列表，如果它们在'list_2'中的话。如果x不在list_2中，则反过来'[x for list_1]。 – Wright

@MBatish如果您接受将_one_列表保存在内存中，那么您可以使用该列表创建一个集合，并在另一个列表上进行迭代（读取文件）。那会很快。否则它将永远占用。 –

好的使用all（）函数我正在避免这种情况。只是混淆了为什么反转不起作用。 – MBasith

答

因此，这是你如何让他们：

exists = [item for item in list_1 if item in list_2] 
does_not_exist = [item for item in list_1 if item not in list_2]

而要print他们：

for item in exists: 
    print item 
for item in does_not_exist: 
    print item

但是，如果你只想打印：

for item in list_1: 
    if item in list_2: 
     print item

感谢您的回复。但是这会将输出保存到exists和does_does_not_exist变量。我正在处理的文件很大，并且希望避免将它们保存到内存中。 – MBasith

答

你可以使用python的集合两个列表中制定出项目

set(list1).intersection(set(list_2))

见你的代码https://docs.python.org/2/library/sets.html

“我将通过非常大的文件来完成此操作，因此不想将它们存储在内存中，如列表或字典”... –

答

的一个问题是，所有的方法returns false if any single check returns false。另一个是fruit_1 in fruit_2部分正在检查以查看fruit_1是否为fruit_2的子字符串。如果我们要修改清单，让您的逻辑工作，他们看起来像：

list_1 = ['apple', 
      'pear', 
      'orange', 
      'kiwi', 
      'berry', 
      'banana', 
      'grape'] 

list_2 = ['grape', 
      'grape', 
      'grape', 
      'grape', 
      'grape']

，但可能是：

list_1 = ['apple', 
      'pear', 
      'orange', 
      'kiwi', 
      'berry', 
      'banana', 
      'grape'] 

list_2 = ['strawberry', 
      'strawberry', 
      'strawberry', 
      'strawberry', 
      'strawberry', 
      'strawberry']

因为berry是strawberry。如果我们继续使用迭代进行此项检查，而不是套，as @wrdeman suggested一个路口，然后使用你所提供的数据集，它应该是这样的：

for fruit_1 in list_1: 
    if fruit_1 in list_2: 
     print(fruit)

的其他修改，可能是将all更改为any，其中returns true if any of the iterables items return true。然后你的代码将如下所示：

for fruit_1 in list_1: 
    if any(fruit_1 == fruit_2 for fruit_2 in list_2): 
     print(fruit_1)

答

我能够通过进行真/假评估来完成反演。

list_1 = ['apple', 
      'pear', 
      'orange', 
      'kiwi', 
      'strawberry', 
      'banana'] 

list_2 = ['kiwi', 
      'melon', 
      'grape', 
      'pear'] 

# DOES exist 
for fruit_1 in list_1: 
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is False: 
     print(fruit_1) 

print('\n') 

# DOES NOT exist 
for fruit_1 in list_1: 
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2) is True: 
     print(fruit_1)

答

我推荐pandas，它适用于大规模数据。

使用PIP进行安装：

pip install pandas

并在某种程度上，你可以做到这样的：

import pandas as pd 

s1 = pd.Index(list_1) 
s2 = pd.Index(list_2) 

exists = s1.intersection(s2) 
does_not_exist = s1.difference(s2)

现在你会看到神奇的东西，如果你执行print exists

请参阅Pandas Docs

答

问题的代码是如何对all（）函数进行评估。把它分解得更简单一点。

## DOES EXIST 
print all('kiwi' in fruit_2 for fruit_2 in ['pear', 'kiwi']) 
print all('pear' in fruit_2 for fruit_2 in ['pear', 'kiwi'])

则计算结果为

False 
False

反之，如果你做这样的事情

#DOES NOT EXIST 
print all('apple' not in fruit_2 for fruit_2 in ['pear', 'kiwi']) 
print all('pear' not in fruit_2 for fruit_2 in ['pear', 'kiwi'])

则计算结果为

True 
False

我不能找出为什么是这样的原因，但它可能是如何的全部（）函数返回true 如果迭代的所有元素都为真否则为false。

在任何情况下，我认为使用任何（）而不是所有（）的DOES存在部分将工作。

print "DOES NOT EXIST" 
for fruit_1 in list_1: 
    # print all(fruit_1 not in fruit_2 for fruit_2 in list_2) 
    if all(fruit_1 not in fruit_2 for fruit_2 in list_2): 
     print(fruit_1) 

print "\nDOES EXIST" 
for fruit_1 in list_1: 
    if any(fruit_1 in fruit_2 for fruit_2 in list_2): 
     print(fruit_1) 

DOES NOT EXIST 
apple 
orange 
strawberry 
banana 

DOES EXIST 
pear 
kiwi

答

这是使用pandas.read_csv创建存储器中的溶液映射文件：

import pandas as pd 

list1 = pd.read_csv('list1.txt', dtype=str, header=None, memory_map=True) 
list2 = pd.read_csv('list2.txt', dtype=str, header=None, memory_map=True) 

exists = pd.merge(list1, list2, how='inner', on=0) 
for fruit in exists[0].tolist(): 
    print fruit

的list1.txt和list2.txt文件包含从问题的字符串，每行一个字符串。

输出

pear 
kiwi

我没有任何真正的大文件进行实验，所以我没有任何性能测量。

Python检查项目在迭代时存在于列表中

相关推荐