如何获得某个值出现的行?
问题描述:
我有orders_df
:如何获得某个值出现的行?
Symbol Order Shares
Date
2011-01-10 AAPL BUY 1500
2011-01-13 AAPL SELL 1500
2011-01-13 IBM BUY 4000
2011-01-26 GOOG BUY 1000
2011-02-02 XOM SELL 4000
2011-02-10 XOM BUY 4000
2011-03-03 GOOG SELL 1000
2011-03-03 IBM SELL 2200
2011-05-03 IBM BUY 1500
2011-06-03 IBM SELL 3300
2011-08-01 GOOG BUY 55
2011-08-01 GOOG SELL 55
我想有一个映射Date
到SELLS
在该日期数的变量。我也想要一个对称变量BUY
。
我想这样做是为了所有Orders
做
num_orders_per_day = orders_df.groupby(['Date']).size()
,并得到:
Date
2011-01-10 1
2011-01-13 2
2011-01-26 1
2011-02-02 1
2011-02-10 1
2011-03-03 2
2011-05-03 1
2011-06-03 1
2011-08-01 2
但事实并非所需的输出。
我要的是sells_on_a_day
:
2011-01-13 1
2011-02-02 1
2011-03-03 2
2011-06-03 1
2011-08-01 1
,然后类似buys_on_a_day
变量。
答
通过boolean indexing
然后第一过滤得到count
:
num_sells_per_day = orders_df[orders_df['Order'] == 'SELL']
.groupby(level=0).size().reset_index(name='count')
print (num_sells_per_day)
Date count
0 2011-01-13 1
1 2011-02-02 1
2 2011-03-03 2
3 2011-06-03 1
4 2011-08-01 1
备选:
num_sells_per_day = orders_df.query("Order == 'SELL'")
.groupby(level=0)
.size()
.reset_index(name='count')
print (num_sells_per_day)
Date count
0 2011-01-13 1
1 2011-02-02 1
2 2011-03-03 2
3 2011-06-03 1
4 2011-08-01 1
还可以创建2列在一起,只得到NaN
■如果某些值丢失:
df1 = orders_df.groupby(['Date','Order']).size().unstack()
print (df1)
Order BUY SELL
Date
2011-01-10 1.0 NaN
2011-01-13 1.0 1.0
2011-01-26 1.0 NaN
2011-02-02 NaN 1.0
2011-02-10 1.0 NaN
2011-03-03 NaN 2.0
2011-05-03 1.0 NaN
2011-06-03 NaN 1.0
2011-08-01 1.0 1.0
为了清晰起见,我会将名称更改为'num_sells_per_day',但它确实有效。 – dirtysocks45