使用PostgreSQL查询生成具有日常统计信息的时间序列
我发现自己处于必须制定(对我而言)相当复杂的SQL查询的位置,而且我似乎无法控制它。使用PostgreSQL查询生成具有日常统计信息的时间序列
我有一个名为orders
的表格和一个相关表格order_state_history
,它记录了这些订单随时间的状态(见下文)。
我现在需要生成一系列行 - 每天一行 - 包含当天结束时处于特定状态的订单数量(请参阅report
)。另外我想只考虑order.type = 1
的订单。
数据驻留在PostgreSQL数据库中。我已经找到了如何使用GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days
来生成时间序列,该序列允许我在没有记录状态变化的日子生成行。
我目前的做法是加入orders
,order_state_history
而产生的一系列days
一起,试图筛选出所有具有DATE(order_state_history.timestamp) > DATE(days)
然后first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC)
某种方式得到在这一天每个订单的最终状态行,但这是我微小的SQL经验抛弃了我的地方。
我只是无法用头来解决问题。
这甚至可以在单个查询中解决,还是我最好提供建议,以通过某种智能脚本来计算数据,该脚本每天执行一个查询? 什么将是一个合理的方法来解决这个问题?
orders===
id type
10000 1
10001 1
10002 2
10003 2
10004 1
order_state_history===
order_id index timestamp new_state
10000 1 01.01.2001 12:00 NEW
10000 2 02.01.2001 13:00 ACTIVE
10000 3 03.01.2001 14:00 DONE
10001 1 02.01.2001 13:00 NEW
10002 1 03.01.2001 14:00 NEW
10002 2 05.01.2001 10:00 ACTIVE
10002 3 05.01.2001 14:00 DONE
10003 1 07.01.2001 04:00 NEW
10004 1 05.01.2001 14:00 NEW
10004 2 10.01.2001 17:30 DONE
Expected result===
date new_orders active_orders done_orders
01.01.2001 1 0 0
02.01.2001 1 1 0
03.01.2001 1 0 1
04.01.2001 1 0 1
05.01.2001 2 0 1
06.01.2001 2 0 1
07.01.2001 2 0 1
08.01.2001 2 0 1
09.01.2001 2 0 1
10.01.2001 1 0 2
步骤1.计算状态的累加值每个订单,使用值NEW = 1,ACTIVE = 1,DONE = 2:
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
order_id | day | state
----------+------------+-------
10000 | 2001-01-01 | 1
10000 | 2001-01-02 | 2
10000 | 2001-01-03 | 4
10001 | 2001-01-02 | 1
10004 | 2001-01-05 | 1
10004 | 2001-01-10 | 3
(6 rows)
步骤2计算每个转移矩阵基于状态从步骤1的顺序(2表示新建 - > ACTIVE,3表示新建 - > DONE,4种手段主动 - > DONE):
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
order_id | day | state | new | active | done
----------+------------+-------+-----+--------+------
10000 | 2001-01-01 | 1 | 1 | 0 | 0
10000 | 2001-01-02 | 2 | -1 | 1 | 0
10000 | 2001-01-03 | 4 | 0 | -1 | 1
10001 | 2001-01-02 | 1 | 1 | 0 | 0
10004 | 2001-01-05 | 1 | 1 | 0 | 0
10004 | 2001-01-10 | 3 | -1 | 0 | 1
(6 rows)
步骤3.计算每个状态的一系列的累加值天:
select distinct
day::date,
sum(new) over w as new,
sum(active) over w as active,
sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
) s
using(day)
window w as (order by day)
order by 1
day | new | active | done
------------+-----+--------+------
2001-01-01 | 1 | 0 | 0
2001-01-02 | 1 | 1 | 0
2001-01-03 | 1 | 0 | 1
2001-01-04 | 1 | 0 | 1
2001-01-05 | 2 | 0 | 1
2001-01-06 | 2 | 0 | 1
2001-01-07 | 2 | 0 | 1
2001-01-08 | 2 | 0 | 1
2001-01-09 | 2 | 0 | 1
2001-01-10 | 1 | 0 | 2
(10 rows)
请检查预期结果(为什么03.01有2个新订单?),并添加下一个预期行,直到05.01至少。 – klin
我添加了所有相关的行。 03.01。有两个新订单,因为在两个02.01。和03.01。有新订单(10001和10002)。订单10001 **在新的状态下保持**,因此在接下来的所有日子都会计入。计数是总计,结果行'new_orders'计算在一天结束时处于NEW状态的所有订单,无论其状态是否更改。 –
但是10002是2类的,所以不应该被计算在内? – klin