如何将SQL连接重写到窗口函数中?
问题描述:
数据库是HP Vertica的7或PostgreSQL 9如何将SQL连接重写到窗口函数中?
create table test (
id int,
card_id int,
tran_dt date,
amount int
);
insert into test values (1, 1, '2017-07-06', 10);
insert into test values (2, 1, '2017-06-01', 20);
insert into test values (3, 1, '2017-05-01', 30);
insert into test values (4, 1, '2017-04-01', 40);
insert into test values (5, 2, '2017-07-04', 10);
在最近一天使用的支付卡,什么是在最近90天内收取该卡上的最大金额。
select t.card_id, max(t2.amount) max
from test t
join test t2 on t2.card_id=t.card_id and t2.tran_dt>='2017-04-06'
where t.tran_dt>='2017-07-06'
group by t.card_id
order by t.card_id;
结果是正确的
card_id max
------- ---
1 30
我想重写查询到SQL窗口功能。
select card_id, max(amount) over(partition by card_id order by tran_dt range between '60 days' preceding and current row) max
from test
where card_id in (select card_id from test where tran_dt>='2017-07-06')
order by card_id;
但结果集不匹配,这怎么办?
测试数据在这里: http://sqlfiddle.com/#!17/db317/1
答
据我所知,PostgreSQL的窗口功能不支持有界range preceding
因此range between '90 days' preceding
将无法正常工作。它支持有界rows preceding
如rows between 90 preceding
,但随后你将需要汇集类似于窗口功能对基于时间的行操作以下是按时间顺序查询:
SELECT c.card_id, t.amount, g.d as d_series
FROM generate_series(
'2017-04-06'::timestamp, '2017-07-06'::timestamp, '1 day'::interval
) g(d)
CROSS JOIN (SELECT distinct card_id from test) c
LEFT JOIN test t ON t.card_id = c.card_id and t.tran_dt = g.d
ORDER BY c.card_id, d_series
对于你所需要的(根据你的问题描述),我会坚持使用group by
。
答
我不能尝试PostgreSQL,但在Vertica中,您可以应用ANSI标准的OLAP窗口函数。
但是,您需要嵌套两个查询:如果窗口函数具有需要在结果集中进行评估的所有行,则它只会返回合理的结果。
但您只希望显示'2017-07-06'行。
所以你必须在外部查询过滤该日期:
WITH olap_output AS (
SELECT
card_id
, tran_dt
, MAX(amount) OVER (
PARTITION BY card_id
ORDER BY tran_dt
RANGE BETWEEN '90 DAYS' PRECEDING AND CURRENT ROW
) AS the_max
FROM test
)
SELECT
card_id
, the_max
FROM olap_output
WHERE tran_dt='2017-07-06'
;
card_id|the_max
1| 30
首先,Postgres的和Vertica的是两个完全不同的事情。其次,你为什么期望两个截然不同的查询结果是相同的? – mustaccio