Python numpy列表过滤

问题描述:

可以优化/矢量化下面的代码吗?现在看来,这似乎不是一种正确的做事方式,也不是非常“蟒蛇”。该代码旨在处理大量数据,因此性能非常重要。Python numpy列表过滤

这个想法是删除这两个列表中不存在的所有值及其名称。

E.g.以下代码的结果将是两个名称分别为“name2”和“name4”的值分别为[2,4和5,6]的结果。

import numpy as np 

names1=np.array(["name1","name2","name3","name4"]) 
names2=np.array(["name2","name4","name5","name6"]) 

pos1=np.array([1,2,3,4]) 
pos2=np.array([5,6,7,8]) 


for entry in names2: 
    if not np.any(names1==entry): 
     pointer=np.where(names2==entry) 
     pos2=np.delete(pos2,pointer) 
     names2=np.delete(names2,pointer) 

for entry in names1: 
    if not np.any(names2==entry): 
     pointer=np.where(names1==entry) 

     pos1=np.delete(pos1,pointer) 
     names1=np.delete(names1,pointer) 
+1

你执着于使用'numpy'这个?这感觉更像是一个“熊猫”问题。 – DSM 2015-03-02 18:10:13

+0

我没有熊猫的经验。任何提示赞赏 – 2015-03-02 18:17:24

这里是矢量答案:

import numpy as np 

names1=np.array(["name1","name2","name3","name4"]) 
names2=np.array(["name2","name4","name5","name6"]) 

pos1=np.array([1,2,3,4]) 
pos2=np.array([5,6,7,8]) 

intersection=np.intersect1d(names1,names2) 
pointer1=np.argwhere(np.in1d(names1, intersection) == False) 
pointer2=np.argwhere(np.in1d(names2, intersection) == False) 

pos2=np.delete(pos2,pointer2) 
names2=np.delete(names2,pointer2) 

pos1=np.delete(pos1,pointer1) 
names1=np.delete(names1,pointer1) 

FWIW,这是pandas一个简单merge操作:

>>> df1 = pd.DataFrame({"name": names1, "pos": pos1}) 
>>> df2 = pd.DataFrame({"name": names2, "pos": pos2}) 
>>> df1 
    name pos 
0 name1 1 
1 name2 2 
2 name3 3 
3 name4 4 
>>> df2 
    name pos 
0 name2 5 
1 name4 6 
2 name5 7 
3 name6 8 
>>> df1.merge(df2, on="name", suffixes=[1,2]) 
    name pos1 pos2 
0 name2  2  5 
1 name4  4  6