如何在大熊猫数据帧列

问题描述：

请看下面的例子对付列表类似的数据：如何在大熊猫数据帧列

我有电子邮件的表，每一个电子邮件ID，和两个标签栏，通过不同的代码路径产生的，含与这些电子邮件相关的标签列表。

df = pd.DataFrame({ 
    'id': [1,2,3,4], 
    'labels1': [np.array(['red']), np.array(['blue', 'green']), np.array(['blue']), np.nan], 
    'labels2': [np.nan, np.nan, np.array(['yellow', 'purple']), np.array(['magenta'])] 
}) 

df 
    id  labels1   labels2 
0 1   [red]    NaN 
1 2 [blue, green]    NaN 
2 3   [blue] [yellow, purple] 
3 4   NaN   [magenta]

所以，我需要一种方法来产生以下数据框：

df_merge 
    id     labels  
0 1     [red] 
1 2   [blue, green] 
2 3 [blue, yellow, purple] 
3 4    [magenta]

但使用lambda函数，我可能会与标列数据做抛出一个ValueError异常：

df.apply(lambda x: np.unique(np.append(x['labels1'], x['labels2'])), axis=1) 

ValueError: Shape of passed values is (4, 2), indices imply (4, 4)

我在上面尝试了很多不同的变化，都无济于事。我想知道像这样的像数组一样的列数据是否是一种熊猫反模式，如果是这样，有什么更好的方法？

答

化妆NaN到[]使用applymap
sum跨行

df[['id']].assign(
    labels=labels.applymap(lambda x: x if isinstance(x, list) else []).sum(1) 
) 

    id     labels 
0 1     [red] 
1 2   [blue, green] 
2 3 [blue, yellow, purple] 
3 4    [magenta]

如何在大熊猫数据帧列

相关推荐