Python numpy列表过滤
问题描述:
可以优化/矢量化下面的代码吗?现在看来,这似乎不是一种正确的做事方式,也不是非常“蟒蛇”。该代码旨在处理大量数据,因此性能非常重要。Python numpy列表过滤
这个想法是删除这两个列表中不存在的所有值及其名称。
E.g.以下代码的结果将是两个名称分别为“name2”和“name4”的值分别为[2,4和5,6]的结果。
import numpy as np
names1=np.array(["name1","name2","name3","name4"])
names2=np.array(["name2","name4","name5","name6"])
pos1=np.array([1,2,3,4])
pos2=np.array([5,6,7,8])
for entry in names2:
if not np.any(names1==entry):
pointer=np.where(names2==entry)
pos2=np.delete(pos2,pointer)
names2=np.delete(names2,pointer)
for entry in names1:
if not np.any(names2==entry):
pointer=np.where(names1==entry)
pos1=np.delete(pos1,pointer)
names1=np.delete(names1,pointer)
答
这里是矢量答案:
import numpy as np
names1=np.array(["name1","name2","name3","name4"])
names2=np.array(["name2","name4","name5","name6"])
pos1=np.array([1,2,3,4])
pos2=np.array([5,6,7,8])
intersection=np.intersect1d(names1,names2)
pointer1=np.argwhere(np.in1d(names1, intersection) == False)
pointer2=np.argwhere(np.in1d(names2, intersection) == False)
pos2=np.delete(pos2,pointer2)
names2=np.delete(names2,pointer2)
pos1=np.delete(pos1,pointer1)
names1=np.delete(names1,pointer1)
答
FWIW,这是pandas
一个简单merge
操作:
>>> df1 = pd.DataFrame({"name": names1, "pos": pos1})
>>> df2 = pd.DataFrame({"name": names2, "pos": pos2})
>>> df1
name pos
0 name1 1
1 name2 2
2 name3 3
3 name4 4
>>> df2
name pos
0 name2 5
1 name4 6
2 name5 7
3 name6 8
>>> df1.merge(df2, on="name", suffixes=[1,2])
name pos1 pos2
0 name2 2 5
1 name4 4 6
你执着于使用'numpy'这个?这感觉更像是一个“熊猫”问题。 – DSM 2015-03-02 18:10:13
我没有熊猫的经验。任何提示赞赏 – 2015-03-02 18:17:24