像Qlik一样在熊猫中进行可交叉?

问题描述:

我有一个数据帧:像Qlik一样在熊猫中进行可交叉?

df1=pd.DataFrame({ 
     'ID':[101,102], 
     'Name':['Axel','Bob'], 
     'US':['GrA','GrC'], 
     'Europe':['GrB','GrD'], 
     'AsiaPac':['GrZ','GrF'] 
    }) 

,我要改成这样:

df2=pd.DataFrame({ 
    'ID':[101,101,101,102,102,102], 
    'Name':['Axel','Axel','Axel','Bob','Bob','Bob'], 
    'Region':['US','Europe','AsiaPac','US','Europe','AsiaPac'], 
    'Group':['GrA','GrB','GrZ','GrC','GrD','GrF'] 
}) 

我该怎么办呢?熊猫中有一个交叉表功能,但它没有这样做。在Qlik中我只是做

Crosstable(Region,Group,2) 
    LOAD 
     ID, 
     Name, 
     US, 
     Europe, 
     AsiaPac 

而且我会从df1去df2。我如何在Python(熊猫或其他)中做到这一点?

这实质上是将您的数据从宽格式转换为长格式,因为它在R术语中是已知的。在熊猫,你可以用pd.melt做到这一点:

pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group') 
#  ID Name Region Group 
# 0 101 Axel AsiaPac GrZ 
# 1 102 Bob AsiaPac GrF 
# 2 101 Axel Europe GrB 
# 3 102 Bob Europe GrD 
# 4 101 Axel  US GrA 
# 5 102 Bob  US GrC 

如果需要排序上IDNameGroup你列,在您的示例输出,您可以添加.sort_values()表达式:

pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group').sort_values(['ID', 'Group']) 
#  ID Name Region Group 
# 4 101 Axel  US GrA 
# 2 101 Axel Europe GrB 
# 0 101 Axel AsiaPac GrZ 
# 5 102 Bob  US GrC 
# 3 102 Bob Europe GrD 
# 1 102 Bob AsiaPac GrF 
+0

完美,谢谢! –

+0

'var_name'和'value_name'有什么区别? – hhh

您可以尝试

stack()

df1.set_index(['ID','Name']).stack().reset_index().rename(columns={'level_2':'Region',0:'Group'}) 
Out[890]: 
    ID Name Region Group 
0 101 Axel AsiaPac GrZ 
1 101 Axel Europe GrB 
2 101 Axel  US GrA 
3 102 Bob AsiaPac GrF 
4 102 Bob Europe GrD 
5 102 Bob  US GrC 

第二

pd.wide_to_long,甚至是矫枉过正。 :)

df1=df1.rename(columns={'AsiaPac':'Group_AsiaPac','Europe':'Group_Europe','US':'Group_US'}) 
pd.wide_to_long(df1,['Group'], i=['ID','Name'], j='Region',sep='_',suffix='.').reset_index() 

Out[918]: 
    ID Name Region Group 
0 101 Axel AsiaPac GrZ 
1 101 Axel Europe GrB 
2 101 Axel  US GrA 
3 102 Bob AsiaPac GrF 
4 102 Bob Europe GrD 
5 102 Bob  US GrC 
+0

可爱,谢谢! –

+0

@AlhpaDelta Yw〜 – Wen