追加缺失月份和年份选择值
问题描述:
我有SELECT:追加缺失月份和年份选择值
SELECT month, year, ROUND(AVG(q_overall) OVER (rows BETWEEN 10000 preceding and current row),2) as avg
FROM (
SELECT EXTRACT(Month FROM date) as month, EXTRACT(Year FROM date) as year, ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
ORDER BY year, month) a
输出:
month year avg
-----------------
12 2012 5
1 2013 4.5
2 2013 4.1
4 2013 4.8
5 2013 4.7
我必须追加该表缺少值(在本例中有3次月在2013年)。 AVG的必须是一样的前一行,这意味着我需要这个表追加用:
3 2013 4.1
我能做到这一点具有自联接和generate_series,或与一些UNION选择?
答
您可以简化您的选择。它不需要子查询:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (rows BETWEEN 10000 preceding and current row), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
窗口功能需要order by
。我假设你真的打算:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
然后,填补了值,您可以使用generate_series()
:
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM (SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1
GROUP BY year, month;
我认为这会做你想要什么。
答
我可以用SELF JOINS和generate_series做这个吗?
是的,你很接近,但你目前的查询做一个累积平均。最棘手的部分是填补国内空白,与前值(如PostgreSQL的支持LAST_VALUE
这将是更容易的选择IGNORE NULLS
...)
SELECT month,
year,
MAX(q_overall) -- assign the value to all rows within the same group
OVER (PARTITION BY grp)
FROM
(
SELECT all_months.month, all_months.year, p.q_overall,
-- assign a new group number whenever there's a value in q_overall
SUM(CASE WHEN q_overall IS NULL THEN 0 ELSE 1 END)
OVER (ORDER BY all_months.month, all_months.year
ROWS UNBOUNDED PRECEDING) AS grp
FROM
(-- create all months with min and max date
SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) AS all_months
LEFT JOIN
(-- do the average per month calculation
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
) AS p
ON EXTRACT(year FROM ym.date) = all_months.month
AND EXTRACT(month FROM ym.date) = all_months.year
) AS dt
编辑:
哎呀,这是过于复杂,问了累加平均,然后空值的问题不会改变结果,而且也没有必要填补国内空白
答
最终查询:
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY EXTRACT(Year FROM ym.date), EXTRACT(Month FROM ym.date)), 2)
FROM
(SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed WHERE business_id = 1 AND site = 'facebook')
ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1 AND site = 'facebook'
GROUP BY year, month;
非常感谢你,你真的帮了我哥登。 Postgresql在你的最后一个查询中抛出了一些错误,因为pg不知道第3行ORDER BY中的年和月,而fb_parsed表有很多其他数据,所以我必须追加一些WHERE clausules,但现在它工作得很完美。谢谢。 – Michal