大熊猫数据帧从多指标DateIndex
问题描述:
今后每年第一天我有这样的数据帧:大熊猫数据帧从多指标DateIndex
dft2 = pd.DataFrame(np.random.randn(20, 1),
columns=['A'],
index=pd.MultiIndex.from_product([pd.date_range('20130101',
periods=10,
freq='4M'),
['a', 'b']]))
,看起来像这样,当我打印出来。
输出:
A
2013-01-31 a 0.275921
b 1.336497
2013-05-31 a 1.040245
b 0.716865
2013-09-30 a -2.697420
b -1.570267
2014-01-31 a 1.326194
b -0.209718
2014-05-31 a -1.030777
b 0.401654
2014-09-30 a 1.138958
b -1.162370
2015-01-31 a 1.770279
b 0.606219
2015-05-31 a -0.819126
b -0.967827
2015-09-30 a -1.423667
b 0.894103
2016-01-31 a 1.765187
b -0.334844
如何由是当年分钟行选择过滤器?像2013-01-31
,2014-01-31
?
谢谢。
答
# Create dataframe from the dates in the first level of the index.
df = pd.DataFrame(dft2.index.get_level_values(0), columns=['date'], index=dft2.index)
# Add a `year` column that gets the year of each date.
df = df.assign(year=[d.year for d in df['date']])
# Find the minimum date of each year by grouping.
min_annual_dates = df.groupby('year')['date'].min().tolist()
# Filter the original dataframe based on these minimum dates by year.
>>> dft2.loc[(min_annual_dates, slice(None)), :]
A
2013-01-31 a 1.087274
b 1.488553
2014-01-31 a 0.119801
b 0.922468
2015-01-31 a -0.262440
b 0.642201
2016-01-31 a 1.144664
b 0.410701
答
或者你可以尝试使用isin
dft1=dft2.reset_index()
dft1['Year']=dft1.level_0.dt.year
dft1=dft1.groupby('Year')['level_0'].min()
dft2[dft2.index.get_level_values(0).isin(dft1.values)]
Out[2250]:
A
2013-01-31 a -1.072400
b 0.660115
2014-01-31 a -0.134245
b 1.344941
2015-01-31 a 0.176067
b -1.792567
2016-01-31 a 0.033230
b -0.960175