Impala上的多维数据集操作符

问题描述：

在执行Impala和PrestoDB之间的基准测试时，我们注意到Imapala中的数据透视表相当困难，因为它没有像Presto那样的多维数据集操作符。以下是Presto中的两个示例：Impala上的多维数据集操作符

CUBE运算符为给定的一组列生成所有可能的分组集合（即幂集）。例如，查询：`

SELECT origin_state, destination_state, sum(package_weight) 
FROM shipping 
GROUP BY CUBE (origin_state, destination_state);

相当于：

SELECT origin_state, destination_state, sum(package_weight) 
FROM shipping 
GROUP BY GROUPING SETS (
(origin_state, destination_state), 
(origin_state), 
(destination_state), 
());

又如ROLLUP运算符。完整的文档在这里：https://prestodb.io/docs/current/sql/select.html。

它不是synthy糖，因为PRESTO为整个查询执行一次表扫描 - 所以使用这个操作符可以在一个请求中创建数据透视表Impala需要运行2-3个查询。

有没有一种方法可以在3个Impala instaead中进行一次查询/表扫描？否则，在创建任何类型的数据透视表时，性能会变得很糟糕。

答

我们可以使用impala windo函数，而不是单列输出，您将得到3列。

SELECT origin_state, 
     destination_state, 
     SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest, 
     SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg, 
     SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest 
FROM shipping;

我不知道天气，这将帮助您或没有，但你会得到关于解析函数的想法 –

链接：HTTPS：//www.cloudera.com/documentation/enterprise/5-6-x/topics/ impala_analytic_functions.html＃以上 –

Impala上的多维数据集操作符

相关推荐