获取从Python的数据帧基于指数
问题描述:
我有一个数据帧是从groupby
调用结果获取从Python的数据帧基于指数
test=uniqueStudents.groupby(['index1','index2']).count()
test.head(10)
我期待在那里我发现整个索引1
计数输出的平均获得一个总平均值结果和期望的输出示于下
电流/所需的输出继电器:
有人可以帮我用python代码来实现这个吗?或者还有其他方法可以从数据集中获取吗?
答
在groupby
方法中使用level
参数,该方法可以采用索引的名称。
test.groupby(level='index1').mean()
此外,您可以重置指数和做的by
参数正常GROUPBY。
test.reset_index().groupby('index1').mean()
答
您需要通过index1
水平groupby
和总GroupBy.mean
,然后按列得到DataFrame.mean
:
test = pd.DataFrame({'column4': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column10': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column3': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column8': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column11': {('01-06-15', 278658): 22.0, ('01-06-15', 206905): 101.0, ('02-06-15', 225800): 308.0, ('02-06-15', 225596): 19.0, ('01-06-15', 152551): 64.0, ('01-06-15', 124337): 54.0, ('02-06-15', 235369): 7.0, ('01-06-15', 31883): 124.0, ('03-06-15', 124337): np.nan}, 'column5': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column7': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 3, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column2': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column1': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column6': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column9': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}})
test.index.names = ['index1','index2']
test = test[['column'+str(col) for col in range(1,12)]]
print (test)
column1 column2 column3 column4 column5 column6 \
index1 index2
01-06-15 31883 124 124 124 124 124 124
124337 54 54 54 54 54 54
152551 64 64 64 64 64 64
206905 101 101 101 101 101 101
278658 22 22 22 22 22 22
02-06-15 225596 19 19 19 19 19 19
225800 308 308 308 308 308 308
235369 7 7 7 7 7 7
03-06-15 124337 17 17 17 17 17 17
column7 column8 column9 column10 column11
index1 index2
01-06-15 31883 124 62.0 62.0 62.0 124.0
124337 54 21.0 21.0 21.0 54.0
152551 64 55.0 55.0 55.0 64.0
206905 101 60.0 60.0 60.0 101.0
278658 22 17.0 17.0 17.0 22.0
02-06-15 225596 19 15.0 15.0 15.0 19.0
225800 308 280.0 280.0 280.0 308.0
235369 3 3.0 3.0 3.0 7.0
03-06-15 124337 17 NaN NaN NaN NaN
df = test.groupby(level='index1').mean().mean(axis=1).reset_index(name='val')
print (df)
index1 val
0 01-06-15 57.818182
1 02-06-15 107.939394
2 03-06-15 17.000000
另一种解决方案是第一mean
按列,然后groupby
:
df = test.mean(axis=1).groupby(level='index1').mean().reset_index(name='val')
print (df)
index1 val
0 01-06-15 57.818182
1 02-06-15 107.939394
2 03-06-15 17.000000