hive开窗函数总结

1,sum()函数

select cookieid,createtime,pv,

sum(pv) over(PARTITION BY cookieid ORDER BY createtime) as pv1

FROM test1;

hive开窗函数总结

select cookieid,createtime,pv,

sum(pv) over(PARTITION BY cookieid) as pv1

FROM test1;

hive开窗函数总结

 注:没有order by,不仅分区内没有排序,sum()计算的pv也是整个分区的pv。

        max()函数无论有没有order by 都是计算整个分区的最大值

2,NTILE 函数

NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值

 注如果切片不均匀,默认增加第一个切片的分布

SELECT cookieid,createtime,pv,
NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime) AS ntile1, --分组内将数据分成2片
NTILE(3) OVER(PARTITION BY cookieid ORDER BY createtime) AS ntile2,  --分组内将数据分成3片
NTILE(4) OVER(PARTITION BY cookieid ORDER BY createtime) AS ntile3   --将所有数据分成4片
FROM test1;

hive开窗函数总结
SELECT cookieid,createtime,pv, NTILE(3) OVER(PARTITION BY cookieid ORDER BY pv DESC) AS ntile

FROM test1

where ntile=1;

 3,ROW_NUMBER 函数

应用场景:获取分组内排序第一的记录、获取一个session中的第一条refer等

SELECT cookieid,createtime,pv,

ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn

FROM test1;

hive开窗函数总结

 rankdense_rankrow_number三者对比:

SELECT cookieid,createtime,pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rank1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS d_rank2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3 
FROM test1;

hive开窗函数总结