如何按日期范围分组

问题描述：

我想通过另一列中的数据对一列中的数据进行分组，但我只想要来自特定时间范围的数据。因此，让我们说2015-11-1至2016-4-30。我的数据库看起来是这样的：如何按日期范围分组

account_id employer_key login_date 
1111111  google   2016-03-03 20:58:36.000000 
2222222  walmart   2015-11-18 11:52:56.000000 
2222222  walmart   2015-11-18 11:53:14.000000 
1111111  google   2016-04-06 23:29:04.000000 
3333333  dell_inc  2015-09-05 14:13:53.000000 
3333333  dell_inc  2016-01-28 03:20:58.000000 
2222222  walmart   2015-09-03 00:11:38.000000 
1111111  google   2015-09-03 00:12:25.000000 
1111111  google   2015-11-13 01:59:59.000000 
4444444  google   2015-11-13 01:59:59.000000 
5555555  dell_inc  2015-03-12 01:59:59.000000

我试图得到一个输出，看起来是这样的（它只能显示1或true，如果该人的时间窗口中登录，如果为0或false他们没有）：

employer_key account_id login_date 
google  1111111  1 
       4444444  1 
walmart  2222222  1 
dell_inc  3333333  1 
dell_inc  5555555  0

我该如何去做这件事？

您能否提供一个示例日期范围以符合您的期望输出？ –

对不起，这是一个不同的问题，我已经重新打开它。我没有注意到你需要完全不同的“过滤”... – MaxU

答

你可以这样来做：使用boolean indexing

In [252]: df.groupby(['employer_key','account_id']) \ 
    ...: .apply(lambda x: len(x.query("'2015-11-01' <= login_date <= '2016-04-30'")) > 0) \ 
    ...: .reset_index() 
Out[252]: 
    employer_key account_id  0 
0  dell_inc  3333333 True 
1  dell_inc  5555555 False 
2  google  1111111 True 
3  google  4444444 True 
4  walmart  2222222 True

或：

In [249]: df.groupby(['employer_key','account_id'])['login_date'] \ 
    ...: .apply(lambda x: len(x[x.ge('2015-11-01') & x.le('2016-04-30')]) > 0) 
Out[249]: 
employer_key account_id 
dell_inc  3333333  True 
       5555555  False 
google  1111111  True 
       4444444  True 
walmart  2222222  True 
Name: login_date, dtype: bool

或额外使用reset_index()：

In [250]: df.groupby(['employer_key','account_id'])['login_date'] \ 
    ...: .apply(lambda x: len(x[x.ge('2015-11-01') & x.le('2016-04-30')]) > 0) \ 
    ...: .reset_index() 
Out[250]: 
    employer_key account_id login_date 
0  dell_inc  3333333  True 
1  dell_inc  5555555  False 
2  google  1111111  True 
3  google  4444444  True 
4  walmart  2222222  True

答

使用between到标志和groupby + max到得到行。

s = df.set_index(['employer_key', 'account_id']).login_date 
flag = s.between('2015-11-01', '2016-04-30').astype(np.uint8) 
flag.groupby(level=[0, 1]).max().reset_index() 

    employer_key account_id login_date 
0  dell_inc  3333333   1 
1  dell_inc  5555555   0 
2  google  1111111   1 
3  google  4444444   1 
4  walmart  2222222   1

如何按日期范围分组

相关推荐