使用SQL统计出每个用户的累积访问次数
我们有如下的用户访问数据
userId |
visitDate |
visitCount |
u01 |
2017/1/21 |
5 |
u02 |
2017/1/23 |
6 |
u03 |
2017/1/22 |
8 |
u04 |
2017/1/20 |
3 |
u01 |
2017/1/23 |
6 |
u01 |
2017/2/21 |
8 |
U02 |
2017/1/23 |
6 |
U01 |
2017/2/22 |
4 |
要求使用SQL统计出每个用户的累积访问次数,如下表所示:
用户id |
月份 |
小计 |
累积 |
u01 |
2017-01 |
11 |
11 |
u01 |
2017-02 |
12 |
23 |
u02 |
2017-01 |
12 |
12 |
u03 |
2017-01 |
8 |
8 |
u04 |
2017-01 |
3 |
3 |
数据集
u01 2017/1/21 5
u02 2017/1/23 6
u03 2017/1/22 8
u04 2017/1/20 3
u01 2017/1/23 6
u01 2017/2/21 8
u02 2017/1/23 6
u01 2017/2/22 4
1)创建表
create table action
(userId string,
visitDate string,
visitCount int)
row format delimited fields terminated by "\t";
2)加载数据
hive (task)> load data local inpath '/opt/temp/action.txt' overwrite into table action;
执行sql
with t1 as(select userId,date_format(regexp_replace(visitDate,'/','-'),'yyyy-MM') as visitDate ,visitCount from action),
t2 as(select userId,visitDate,sum(visitCount) as visitCount_sum from t1 group by userId,visitDate),
t3 as(select userId,visitDate,visitCount_sum,sum(visitCount_sum) over(partition by userId order by visitDate asc rows between UNBOUNDED PRECEDING and current row) as heji from t2)
select * from t3 order by userId;