追加缺失月份和年份选择值

问题描述:

我有SELECT:追加缺失月份和年份选择值

SELECT month, year, ROUND(AVG(q_overall) OVER (rows BETWEEN 10000 preceding and current row),2) as avg 
FROM (
    SELECT EXTRACT(Month FROM date) as month, EXTRACT(Year FROM date) as year, ROUND(AVG(q_overall),1) as q_overall 
    FROM fb_parsed 
    WHERE business_id = 1 
    GROUP BY year, month 
    ORDER BY year, month) a 

输出:

month year avg  
----------------- 
12  2012 5 
1  2013 4.5 
2  2013 4.1 
4  2013 4.8 
5  2013 4.7 

我必须追加该表缺少值(在本例中有3次月在2013年)。 AVG的必须是一样的前一行,这意味着我需要这个表追加用:

3  2013 4.1 

我能做到这一点具有自联接和generate_series,或与一些UNION选择?

您可以简化您的选择。它不需要子查询:

SELECT EXTRACT(Month FROM date) as month, 
     EXTRACT(Year FROM date) as year, 
     ROUND(AVG(q_overall), 1) as q_overall, 
     ROUND(AVG(AVG(q_overall)) OVER (rows BETWEEN 10000 preceding and current row), 2) 
FROM fb_parsed 
WHERE business_id = 1 
GROUP BY year, month; 

窗口功能需要order by。我假设你真的打算:

SELECT EXTRACT(Month FROM date) as month, 
     EXTRACT(Year FROM date) as year, 
     ROUND(AVG(q_overall), 1) as q_overall, 
     ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2) 
FROM fb_parsed 
WHERE business_id = 1 
GROUP BY year, month; 

然后,填补了值,您可以使用generate_series()

SELECT EXTRACT(Month FROM ym.date) as month, 
     EXTRACT(Year FROM ym.date) as year, 
     ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2) 
FROM (SELECT generate_series(date_trunc('month', min(date)), 
          date_trunc('month', max(date)), 
          interval '1 month') as date 
     FROM fb_parsed 
    ) ym LEFT JOIN 
    fb_parsed p 
    ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND 
     EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND 
     p.business_id = 1 
GROUP BY year, month; 

我认为这会做你想要什么。

+0

非常感谢你,你真的帮了我哥登。 Postgresql在你的最后一个查询中抛出了一些错误,因为pg不知道第3行ORDER BY中的年和月,而fb_parsed表有很多其他数据,所以我必须追加一些WHERE clausules,但现在它工作得很完美。谢谢。 – Michal

我可以用SELF JOINS和generate_series做这个吗?

是的,你很接近,但你目前的查询做一个累积平均。最棘手的部分是填补国内空白,与前值(如PostgreSQL的支持LAST_VALUE这将是更容易的选择IGNORE NULLS ...)

SELECT month, 
     year, 
     MAX(q_overall) -- assign the value to all rows within the same group 
     OVER (PARTITION BY grp) 
FROM 
(
    SELECT all_months.month, all_months.year, p.q_overall, 
     -- assign a new group number whenever there's a value in q_overall 
     SUM(CASE WHEN q_overall IS NULL THEN 0 ELSE 1 END) 
     OVER (ORDER BY all_months.month, all_months.year 
      ROWS UNBOUNDED PRECEDING) AS grp 
    FROM 
    (-- create all months with min and max date 
     SELECT generate_series(date_trunc('month', min(date)), 
           date_trunc('month', max(date)), 
           interval '1 month') as date 
     FROM fb_parsed 
    ) AS all_months 
    LEFT JOIN 
    (-- do the average per month calculation 
     SELECT EXTRACT(Month FROM date) as month, 
       EXTRACT(Year FROM date) as year, 
       ROUND(AVG(q_overall),1) as q_overall 
     FROM fb_parsed 
     WHERE business_id = 1 
     GROUP BY year, month 
    ) AS p 
    ON EXTRACT(year FROM ym.date) = all_months.month 
    AND EXTRACT(month FROM ym.date) = all_months.year 
) AS dt 

编辑:

哎呀,这是过于复杂,问了累加平均,然后空值的问题不会改变结果,而且也没有必要填补国内空白

最终查询:

SELECT EXTRACT(Month FROM ym.date) as month, 
     EXTRACT(Year FROM ym.date) as year, 
     ROUND(AVG(AVG(q_overall)) OVER (ORDER BY EXTRACT(Year FROM ym.date), EXTRACT(Month FROM ym.date)), 2) 
FROM 
(SELECT generate_series(date_trunc('month', min(date)), 
         date_trunc('month', max(date)), 
         interval '1 month') as date 
FROM fb_parsed WHERE business_id = 1 AND site = 'facebook') 
ym LEFT JOIN 
    fb_parsed p 
    ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND 
     EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND 
     p.business_id = 1 AND site = 'facebook' 
GROUP BY year, month;