通过python熊猫中的MultiIndex数据迭代

问题描述：

我希望能够通过在多索引上进行分组来迭代pandas DataFrame。在这里，我希望能够在每个行业中一起处理一组行。我加载了一个多索引。通过python熊猫中的MultiIndex数据迭代

from StringIO import StringIO 
data = """industry,location,number 
retail,brazil,294 
technology,china,100 
retail,nyc,2913 
retail,paris,382 
technology,us,2182 
""" 

df = pd.read_csv(StringIO(data), sep=",", index_col=['industry', 'location'])

所以，我希望有些事情到这种效果：

for industry, rows in df.iter_multiindex(): 
    for row in rows: 
     process_row(row)

是否有这样的方式来做到这一点？

答

可以GROUPBY多指数的第一级（行业），然后遍历槽组：

In [102]: for name, group in df.groupby(level='industry'): 
    .....:  print name, '\n', group, '\n' 
    .....: 
retail 
        number 
industry location 
retail brazil  294 
     nyc   2913 
     paris  382 

technology 
        number 
industry location 
technology china  100 
      us   2182

group将每次都是数据帧，然后您可以遍历该数据帧（例如使用for row in group.iterrows()。

但是，在大多数情况下，这样的迭代是不需要的！ process_row需要什么？可能你可以通过矢量化方式直接在groupby对象上执行此操作。

答

不知道为什么你要做到这一点，但你可以做这样的：

for x in df.index: 
    print x[0] # industry 
    process(df.loc[x]) # row

但它不是你平时怎么用数据框中工作，你可能想了解apply()（Essential Basic Functionality也真正有用的）

我想看到一些关于-1 – 2014-12-04 17:52:13

通过python熊猫中的MultiIndex数据迭代

相关推荐