根据条件合并3个不同的数据帧
问题描述:
如何组合三个数据帧,如下所示?根据条件合并3个不同的数据帧
前两个的主要关系必须基于ID1,因为它是两个数据框之间的匹配关系。
第三个数据帧,地址2必须以哈希添加
DF1匹配:
Name1 Name2 Name3 Address ID1 ID2 Own
Matt John1 Jill 878 home 1 0 Deal
Matt John2 Jack 879 home 2 1 Dael
DF2:
Name1 ID1 Address Name4 Address2
Matt 1 878 home face1 face\123
Matt 1 878 home face2 face\345
Matt 1 878 home face3 face\678
Matt 2 879 home head1 head\123
Matt 2 879 home head2 head\345
Matt 2 879 home head3 head\678
DF3:
Address2 Hash
face\123 abc123
face\345 cde321
face\678 efg123
head\123 123efg
head\345 efg321
head\678 acd321
我正试图结合3个dataframes成一个象下面这样:
Name1 Name2 ID1 Address Own Name3 ID2 Name4 Address2 Hash
Matt John1 1 878 home Deal Jill 0 face1 face\123 abc123
Matt John1 1 878 home Deal Jill 0 face2 face\345 cde321
Matt John1 1 878 home Deal Jill 0 face3 face\678 efg123
Matt John2 2 879 home Dael Jack 1 head1 head\123 123efg
Matt John2 2 879 home Dael Jack 1 head2 head\345 efg321
Matt John2 2 879 home Dael Jack 1 head3 head\678 acd321
DF1之间和DF2关键是到ID1 DF2之间和DF3关键是地址2
非常感谢您的帮助。
答
我认为这会起作用。 Ther合并函数几乎适合您想要加入的列。
import numpy as np
import pandas as pd
data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'],
['Matt','John1','Jill','878 home','1','0','Deal'],
['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']])
data2 = np.array([['Name1','ID1','Address','Name4','Address2'],
['Matt', '1','878 home','face1',"face.123"],
['Matt', '1','878 home', 'face2','face.345'],
['Matt', '1','878 home', 'face3', 'face.678'],
['Matt', '2', '879 home', 'head1', 'head.123'],
['Matt', '2', '879 home', 'head2', 'head.345'],
['Matt', '2', '879 home', 'head3', 'head.678']])
#print(data)
data3 = np.array([['Address2','Hash'],
['face.123', 'abc123'],
['face.345','cde321'],
['face.678', 'efg123'],
['head.123', '123efg'],
['head.345', 'efg321'],
['head.678', 'acd321']])
df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:])
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:])
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:])
Cdf= pd.merge(df1,df2, on='ID1', how='inner')
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner')
print(Ddf)
答
从你期望的输出,你似乎并不需要任何规范超出列交叉融合是自动进行的。
>>> df1.merge(df2).merge(df3)
Name1 Name2 Name3 Address ID1 ID2 Own Name4 Address2 Hash
0 Matt John1 Jill 878 home 1 0 Deal face1 face\123 abc123
1 Matt John1 Jill 878 home 1 0 Deal face2 face\345 cde321
2 Matt John1 Jill 878 home 1 0 Deal face3 face\678 efg123
3 Matt John2 Jack 879 home 2 1 Dael head1 head\123 123efg
4 Matt John2 Jack 879 home 2 1 Dael head2 head\345 efg321
5 Matt John2 Jack 879 home 2 1 Dael head3 head\678 acd321
指定单数列作为接受的答案进行合并确实会导致问题,因为您将有后缀列。
>>> df1.merge(df2, on="ID1", how="inner").merge(df3, on="Address2", how="inner")
Name1_x Name2 Name3 Address_x ID1 ID2 Own Name1_y Address_y Name4 \
0 Matt John1 Jill 878home 1 0 Deal Matt 878home face1
1 Matt John1 Jill 878home 1 0 Deal Matt 878home face2
2 Matt John1 Jill 878home 1 0 Deal Matt 878home face3
3 Matt John2 Jack 879home 2 1 Dael Matt 879home head1
4 Matt John2 Jack 879home 2 1 Dael Matt 879home head2
5 Matt John2 Jack 879home 2 1 Dael Matt 879home head3
Address2 Hash
0 face\123 abc123
1 face\345 cde321
2 face\678 efg123
3 head\123 123efg
4 head\345 efg321
5 head\678 acd321
你不就是在这里合并列交叉吗? 'df1.merge(DF2).merge(DF3)'? – miradulo