总结(永久)SQL表中的数据
问题描述:
Geetings,Stackers。总结(永久)SQL表中的数据
我在SQL表中有大量的数据点,我想以一种让人联想到RRD的方式总结它们。
假设表如
ID | ENTITY_ID | SCORE_DATE | SCORE | SOME_OTHER_DATA
----+-----------+------------+-------+-----------------
1 | A00000001 | 01/01/2010 | 100 | some data
2 | A00000002 | 01/01/2010 | 105 | more data
3 | A00000003 | 01/01/2010 | 104 | various text
... | ......... | .......... | ..... | ...
... | A00009999 | 01/01/2010 | 101 |
... | A00000001 | 02/01/2010 | 104 |
... | A00000002 | 02/01/2010 | 119 |
... | A00000003 | 02/01/2010 | 119 |
... | ......... | .......... | ..... |
... | A00009999 | 02/01/2010 | 101 | arbitrary data
... | ......... | .......... | ..... | ...
... | A00000001 | 01/02/2010 | 104 |
... | A00000002 | 01/02/2010 | 119 |
... | A00000003 | 01/01/2010 | 119 |
我想每一个实体记录结束了,每个月:
ID | ENTITY_ID | SCORE_DATE | SCORE |
----+-----------+------------+-------+
... | A00000001 | 01/01/2010 | 100 |
... | A00000002 | 01/01/2010 | 105 |
... | A00000003 | 01/01/2010 | 104 |
... | A00000001 | 01/02/2010 | 100 |
... | A00000002 | 01/02/2010 | 105 |
... | A00000003 | 01/02/2010 | 104 |
(我不关心SOME_OTHER_DATA - 我会挑选一些东西 - 可能是第一个或最后一个记录)
定期做这些事情的简单方法是什么?这样就可以总结出上个日历月中的任何事情吗?
目前我的计划是怎么样的:
- 对于每个ENTITYID
- 对于每个月
- 查找特定的月份
- 更新第一条记录的所有记录平均得分与上一步的结果
- 删除所有不是第一个的记录
- 对于每个月
我无法想象的,虽然这样做的一个非常简洁的方式,不涉及大量的更新和迭代。
这可以在SQL存储过程中完成,也可以合并到生成此数据的.Net应用程序中,因此该解决方案并不一定是“一个大的SQL脚本”,但可以是:)
(SQL-2005)
答
这给一试:
--I am using @table variables here, you will want to use your actual table in place of @YourTable and a #Temptable for @YourTable2, with a PK on ID
SET NOCOUNT ON
DECLARE @YourTable table (ID int,ENTITY_ID char(9),SCORE_DATE datetime,SCORE int ,SOME_OTHER_DATA varchar(100))
DECLARE @YourTable2 table (ID int)
INSERT INTO @YourTable VALUES (1 , 'A00000001','01/01/2010',100,'some data')
INSERT INTO @YourTable VALUES (2 , 'A00000002','01/01/2010',105,'more data')
INSERT INTO @YourTable VALUES (3 , 'A00000003','01/01/2010',104,'various text')
INSERT INTO @YourTable VALUES (4 , 'A00009999','01/01/2010',101,null)
INSERT INTO @YourTable VALUES (5 , 'A00000001','02/01/2010',104,null)
INSERT INTO @YourTable VALUES (6 , 'A00000002','02/01/2010',119,null)
INSERT INTO @YourTable VALUES (7 , 'A00000003','02/01/2010',119,null)
INSERT INTO @YourTable VALUES (8 , 'A00009999','02/01/2010',101,'arbitrary data')
INSERT INTO @YourTable VALUES (9 , 'A00000001','01/02/2010',104,null)
INSERT INTO @YourTable VALUES (10, 'A00000002','01/02/2010',119,null)
INSERT INTO @YourTable VALUES (11, 'A00000003','01/01/2010',119,null)
SET NOCOUNT OFF
SELECT 'BEFORE',* FROM @YourTable ORDER BY ENTITY_ID,SCORE_DATE
UPDATE y
SET SCORE=dt_a.AvgScore
OUTPUT INSERTED.ID --capture all updated rows
INTO @YourTable2
FROM @YourTable y
INNER JOIN (SELECT --get avg score for each ENTITY_ID per month
ENTITY_ID
,AVG(SCORE) as AvgScore
, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) AS MonthOf,DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0)) AS MonthNext
FROM @YourTable
--group by 1st day of current month and 1st day of next month
--so an index can be used when joining derived table to UPDATE table
GROUP BY ENTITY_ID, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0),DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0))
) dt_a ON y.ENTITY_ID=dt_a.ENTITY_ID AND y.SCORE_DATE>=dt_a.MonthOf AND y.SCORE_DATE<dt_a.MonthNext
INNER JOIN (SELECT--get first row for each ENTITY_ID per month
ID,ENTITY_ID,SCORE_DATE,SCORE
FROM (SELECT
ID,ENTITY_ID,SCORE_DATE,SCORE
,ROW_NUMBER() OVER(PARTITION BY ENTITY_ID,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) ORDER BY ENTITY_ID,SCORE_DATE) AS RowRank
FROM @YourTable
) dt
WHERE dt.RowRank=1
) dt_f ON y.ID=dt_f.ID
DELETE @YourTable
WHERE ID NOT IN (SELECT ID FROM @YourTable2)
SELECT 'AFTER ',* FROM @YourTable ORDER BY ENTITY_ID,SCORE_DATE
OUTPUT:
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
BEFORE 1 A00000001 2010-01-01 00:00:00.000 100 some data
BEFORE 9 A00000001 2010-01-02 00:00:00.000 104 NULL
BEFORE 5 A00000001 2010-02-01 00:00:00.000 104 NULL
BEFORE 2 A00000002 2010-01-01 00:00:00.000 105 more data
BEFORE 10 A00000002 2010-01-02 00:00:00.000 119 NULL
BEFORE 6 A00000002 2010-02-01 00:00:00.000 119 NULL
BEFORE 3 A00000003 2010-01-01 00:00:00.000 104 various text
BEFORE 11 A00000003 2010-01-01 00:00:00.000 119 NULL
BEFORE 7 A00000003 2010-02-01 00:00:00.000 119 NULL
BEFORE 4 A00009999 2010-01-01 00:00:00.000 101 NULL
BEFORE 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(11 row(s) affected)
(8 row(s) affected)
(3 row(s) affected)
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
AFTER 1 A00000001 2010-01-01 00:00:00.000 102 some data
AFTER 5 A00000001 2010-02-01 00:00:00.000 104 NULL
AFTER 2 A00000002 2010-01-01 00:00:00.000 112 more data
AFTER 6 A00000002 2010-02-01 00:00:00.000 119 NULL
AFTER 3 A00000003 2010-01-01 00:00:00.000 111 various text
AFTER 7 A00000003 2010-02-01 00:00:00.000 119 NULL
AFTER 4 A00009999 2010-01-01 00:00:00.000 101 NULL
AFTER 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(8 row(s) affected)
答
这会给你平均为您的所有数据:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
要限制对特定的月份,如去年二月,你可以这样做:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = 2010 and month(SCORE_DATE) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
这个版本实际上会表现得更好,但参数是有点不太友好的处理:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= '2/1/2010' and SCORE_DATE < '3/1/2010'
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
如果你想查询总是返回最后一个月的数据,你可以这样做:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = year(dateadd(month, -1, getdate())) and month(dateadd(month, -1, getdate())) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
性能更好的版本:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-2, 0)
and SCORE_DATE < dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-1, 0)
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
@Cylindric,我重读了这个问题,并彻底改变了我的答案。 – 2010-03-11 18:46:43