应用,并在熊猫蟒蛇

问题描述:

我有一个数据帧团,如:应用,并在熊猫蟒蛇

df: 

     cell   COMBINATION_ID  PREDICTION SYNERGY_SCORE 
0  BT-549   ADAM17.AKT 2.188390  7.398240 
1  CAL-148   ADAM17.AKT 10.030628  12.686340 
2  HCC38   ADAM17.AKT 9.217011  -4.351590 
3  DU-4475   ADAM17.FGFR -2.130943  -14.398730 
4  HCC1187   ADAM17.FGFR -1.103040  -6.400371 
5  HCC70   ADAM17.FGFR -2.076458  -14.909000 
6  Hs-578-T   ADAM17.FGFR 3.831822  -7.859544 

我想GROUPBY的COMBINATION_ID并同时获得

结果会是这样预测的相关性和SYNERGY_SCORE:

ADAM17.AKT cor([2.188390,10.030628,9.217011],[7.398240,12.686340,-4.351590] 
ADAM17.FGFR cor([-2.130943,-1.103040, -2.076458 ,3.831822],[-14.398730,-6.400371,-14.909000,-7.859544] 

我可以使用:

df2 = df.groupby('COMBINATION_ID').apply(f) 

但我不知道如何定义def f()

感谢

考虑使用pandas' corr()与定义的功能,假设你有scipy包大熊猫安装。您可以指定的方法:皮尔逊(默认),肯德尔斯皮尔曼

def f(row):  
    row['CORRELATION'] = row['PREDICTION'].corr(row['SYNERGY_SCORE'], method='spearman') 
    return row 

df2 = df.groupby('COMBINATION_ID').apply(f) 

您可以查看上面实际数字新列:

from scipy.stats.stats import spearmanr  

# ADAM17.AKT 
print(spearmanr([2.188390,10.030628,9.217011], 
       [7.398240,12.686340,-4.351590])) 
# ADAM17.FGFR 
print(spearmanr([-2.130943,-1.103040, -2.076458 ,3.831822], 
       [-14.398730,-6.400371,-14.909000,-7.859544]))