如何获得某个值出现的行?

如何获得某个值出现的行?

问题描述:

我有orders_df如何获得某个值出现的行?

 Symbol Order Shares 
Date       
2011-01-10 AAPL BUY 1500 
2011-01-13 AAPL SELL 1500 
2011-01-13 IBM BUY 4000 
2011-01-26 GOOG BUY 1000 
2011-02-02 XOM SELL 4000 
2011-02-10 XOM BUY 4000 
2011-03-03 GOOG SELL 1000 
2011-03-03 IBM SELL 2200 
2011-05-03 IBM BUY 1500 
2011-06-03 IBM SELL 3300 
2011-08-01 GOOG BUY  55 
2011-08-01 GOOG SELL  55 

我想有一个映射DateSELLS在该日期数的变量。我也想要一个对称变量BUY

我想这样做是为了所有Orders

num_orders_per_day = orders_df.groupby(['Date']).size() 

,并得到:

Date 
2011-01-10 1 
2011-01-13 2 
2011-01-26 1 
2011-02-02 1 
2011-02-10 1 
2011-03-03 2 
2011-05-03 1 
2011-06-03 1 
2011-08-01 2 

但事实并非所需的输出。

我要的是sells_on_a_day

2011-01-13 1 
2011-02-02 1 
2011-03-03 2 
2011-06-03 1 
2011-08-01 1 

,然后类似buys_on_a_day变量。

通过boolean indexing然后第一过滤得到count

num_sells_per_day = orders_df[orders_df['Order'] == 'SELL'] 
         .groupby(level=0).size().reset_index(name='count') 
print (num_sells_per_day) 
     Date count 
0 2011-01-13  1 
1 2011-02-02  1 
2 2011-03-03  2 
3 2011-06-03  1 
4 2011-08-01  1 

备选:

num_sells_per_day = orders_df.query("Order == 'SELL'") 
          .groupby(level=0) 
          .size() 
          .reset_index(name='count') 
print (num_sells_per_day) 
     Date count 
0 2011-01-13  1 
1 2011-02-02  1 
2 2011-03-03  2 
3 2011-06-03  1 
4 2011-08-01  1 

还可以创建2列在一起,只得到NaN■如果某些值丢失:

df1 = orders_df.groupby(['Date','Order']).size().unstack() 
print (df1) 
Order  BUY SELL 
Date     
2011-01-10 1.0 NaN 
2011-01-13 1.0 1.0 
2011-01-26 1.0 NaN 
2011-02-02 NaN 1.0 
2011-02-10 1.0 NaN 
2011-03-03 NaN 2.0 
2011-05-03 1.0 NaN 
2011-06-03 NaN 1.0 
2011-08-01 1.0 1.0 
+0

为了清晰起见,我会将名称更改为'num_sells_per_day',但它确实有效。 – dirtysocks45