如何搜索另一列中当前列的值,然后在熊猫的另一列中显示它的ID?
问题描述:
在一个数据框中有4列col1,col1_id,col2,col2_id,我想在col_1中找到col_2值,然后如果有任何匹配,相应的col1_id应该追加到col2_id。如何搜索另一列中当前列的值,然后在熊猫的另一列中显示它的ID?
col_1 col1_id col_2 col2_id
A 1 NaN NaN
B 2 K NaN
D 3 A NaN
J 4 NaN NaN
E 5 H NaN
Z 6 NaN NaN
H 7 H NaN
K 8 Z NaN
任何帮助??,感谢
答
有2个可能的解决方案,似乎第一次看的输出更好。
我认为你需要map
与列col_1
和col1_id
创建的字典d
:
d = df[['col_1','col1_id']].set_index('col_1').to_dict()
print d
{'col1_id': {'A': 1, 'B': 2, 'E': 5, 'D': 3, 'H': 7, 'K': 8, 'J': 4, 'Z': 6}}
df['col2_id'] = df.col_2.map(d['col1_id'])
print df
col_1 col1_id col_2 col2_id
0 A 1 NaN NaN
1 B 2 K 8.0
2 D 3 A 1.0
3 J 4 NaN NaN
4 E 5 H 7.0
5 Z 6 NaN NaN
6 H 7 H 7.0
7 K 8 Z 6.0
print df.col_1.isin(df.col_2)
0 True
1 False
2 False
3 False
4 False
5 True
6 True
7 True
Name: col_1, dtype: bool
df['col2_id'] = df.col1_id.where(df.col_1.isin(df.col_2))
print df
col_1 col1_id col_2 col2_id
0 A 1 NaN 1.0
1 B 2 K NaN
2 D 3 A NaN
3 J 4 NaN NaN
4 E 5 H NaN
5 Z 6 NaN 6.0
6 H 7 H 7.0
7 K 8 Z 8.0
时序:
def pil(df):
df = df.set_index('col_1')
df['col2_id'] = df.col_2.apply(lambda x: x if pd.isnull(x) else df.loc[x, 'col1_id'])
return df.reset_index()
def jez(df):
df['col2_id'] = df.col_2.map(df.set_index('col_1').to_dict()['col1_id'])
return df
print pil(df1)
print jez(df)
In [34]: %timeit jez(df)
1000 loops, best of 3: 1.48 ms per loop
In [35]: %timeit pil(df1)
The slowest run took 4.23 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 2.56 ms per loop
答
尝试:
df = df.set_index('col_1')
df['col2_id'] = df.col_2.apply(lambda x: x if pd.isnull(x) else df.loc[x, 'col1_id'])
df = df.reset_index()
df
col_1 col1_id col_2 col2_id
0 A 1 NaN NaN
1 B 2 K 8.0
2 D 3 A 1.0
3 J 4 NaN NaN
4 E 5 H 7.0
5 Z 6 NaN NaN
6 H 7 H 7.0
7 K 8 Z 6.0
答
在我看来,这个问题看起来像在RDBMS一个标准的任务。所以你可以使用merge()
df['col2_id'] = pd.merge(df, df[['col1', 'col1_id']], left_on='col2', right_on='col1', how='left')['col1_id_y']
请检查解决方案,如果输出可以不同,你可以添加期望的问题。谢谢。 – jezrael