(转) Hive中with cube、with rollup、grouping sets用法
表结构
-
CREATE TABLE test (f1 string,
-
f2 string,
-
f3 string,
-
cnt int) ROW FORMAT delimited FIELDS TERMINATED BY '\t' stored AS textfile;
-
LOAD DATA LOCAL inpath '/data/logs/suiyingli/tmp/test.data' overwrite INTO TABLE test;
原始数据
•A A B 1
•B B A 1
•A A A 2
with cube查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3 WITH CUBE;
with cube结果范例
rollup查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3 WITH ROLLUP;
rollup结果范例
grouping sets查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3
-
GROUPING sets((f1),(f1,f2))
总结
cube的分组组合最全,是各个维度值的笛卡尔(包含null)组合,rollup的各维度组合应满足,前一维度为null后一位维度必须为null,前一维度取非null时,下一维度随意,
grouping sets则为自定义维度,根据需要分组即可。
ps:通过grouping sets的使用可以简化SQL,比group by单维度进行union性能更好。
转载于:https://www.cnblogs.com/Impulse/articles/9273538.html
表结构
-
CREATE TABLE test (f1 string,
-
f2 string,
-
f3 string,
-
cnt int) ROW FORMAT delimited FIELDS TERMINATED BY '\t' stored AS textfile;
-
LOAD DATA LOCAL inpath '/data/logs/suiyingli/tmp/test.data' overwrite INTO TABLE test;
原始数据
•A A B 1
•B B A 1
•A A A 2
with cube查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3 WITH CUBE;
with cube结果范例
rollup查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3 WITH ROLLUP;
rollup结果范例
grouping sets查询语句
-
SELECT f1,
-
f2,
-
f3,
-
sum(cnt),
-
GROUPING__ID,
-
rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
-
FROM test
-
GROUP BY f1,
-
f2,
-
f3
-
GROUPING sets((f1),(f1,f2))
总结
cube的分组组合最全,是各个维度值的笛卡尔(包含null)组合,rollup的各维度组合应满足,前一维度为null后一位维度必须为null,前一维度取非null时,下一维度随意,
grouping sets则为自定义维度,根据需要分组即可。
ps:通过grouping sets的使用可以简化SQL,比group by单维度进行union性能更好。