大熊猫 - 使用上多指标
问题描述:
部分切片设定值我有这段代码,产生以下空数据框:大熊猫 - 使用上多指标
>>> first = ['foo', 'bar']
>>> second = ['baz', 'can']
>>> third = ['ok', 'ko']
>>> colours = ['blue', 'yellow', 'green']
>>> idx = pd.IndexSlice
>>> ix = pd.MultiIndex.from_arrays(np.array([i for i in itertools.product(first, second, third)]).transpose().tolist(),
names=('first', 'second', 'third'))
>>> df1 = pd.DataFrame(index=ix, columns=colours).sort_index()
>>> print(df1)
blue yellow green
first second third
bar baz ko NaN NaN NaN
ok NaN NaN NaN
can ko NaN NaN NaN
ok NaN NaN NaN
foo baz ko NaN NaN NaN
ok NaN NaN NaN
can ko NaN NaN NaN
ok NaN NaN NaN
我打算做什么,是从另一个数据框中填入这个基于多指标空数据帧是给予,并且是基于列的,像下面(列截断清晰度):
baz_ok_blue baz_ko_blue can_ok_blue can_ko_blue baz_ok_yellow
foo -1.385111 -1.014812 -1.419643 1.540341 0.663933
bar 0.445372 -0.226087 0.450982 -1.114169 0.896522
到目前为止,我一直是这样的:
idx = pd.IndexSlice
for s in second:
for t in third:
for c in colours:
column_name = '{s}_{t}_{c}'.format(s=s, c=c, t=t)
values = df2[column_name]
df1.loc[idx[:, s, t], c] = values
在每次迭代中,values
系列都已正确确定,但Pandas与第一级df1的MultiIndex不匹配values
的索引。因此,所有的df1值都保持为NaN
,因为Pandas试图将MultiIndex与单个索引匹配。有没有办法呢?
基本上,为了给出更高层次的观点,我只是试图将df2(基于字符串列)重新排列为df1(基于MultiIndex)的形式。
答
您可以创建MultiIndex
首先str.split
,然后通过stack
和最后reindex
重塑:
df.columns = df.columns.str.split('_', expand=True)
print (df)
baz can baz
ok ko ok ko ok
blue blue blue blue yellow
foo -1.385111 -1.014812 -1.419643 1.540341 0.663933
bar 0.445372 -0.226087 0.450982 -1.114169 0.896522
df = df.stack([0,1]).reindex(index=df1.index, columns=df1.columns)
print (df)
blue yellow green
first second third
bar baz ko -0.226087 NaN NaN
ok 0.445372 0.896522 NaN
can ko -1.114169 NaN NaN
ok 0.450982 NaN NaN
foo baz ko -1.014812 NaN NaN
ok -1.385111 0.663933 NaN
can ko 1.540341 NaN NaN
ok -1.419643 NaN NaN
谢谢,辉煌。然而,似乎有些值在最后阶段丢失了(它们在堆叠后仍然存在,但在重新索引后消失 - 最终成为'NaN') – Jivan
是否可以模拟它? – jezrael
这可能是由于事实上,'second'和'color'可能具有相同的标签 - 我会尝试更改此 – Jivan