如何根据时间间隔删除数据
我在database.The表的表中有一个字段VehId(int)和trackdt(DATETIME)如何根据时间间隔删除数据
我的表有3000个十亿行(是的,这是正确的3000十亿)。所以我想删除旧数据。但我想基于时间间隔删除数据。
我想删除每辆车每20秒的记录。
下面是表我有
VehId Trackdt
1 2017-05-20 00:00:30.000
2 2017-05-20 00:00:32.000
2 2017-05-20 00:00:42.000
1 2017-05-20 00:00:40.000
2 2017-05-20 00:00:52.000
1 2017-05-20 00:00:50.000
1 2017-05-20 00:01:00.000
2 2017-05-20 00:01:02.000
1 2017-05-20 00:01:10.000
1 2017-05-20 00:01:20.000
2 2017-05-20 00:01:12.000
1 2017-05-20 00:01:30.000
2 2017-05-20 00:01:22.000
2 2017-05-20 00:01:32.000
后删除数据应该像下面
VehId TRackdt
1 2017-05-20 00:00:30.000
2 2017-05-20 00:00:32.000
1 2017-05-20 00:01:00.000
2 2017-05-20 00:01:02.000
1 2017-05-20 00:01:30.000
2 2017-05-20 00:01:32.000
我想下面的查询,但它花费过多时间
ALTER PROCEDURE [dbo].[DELETEINTERVALDATA]
@FROMDATE DATETIME,
@TODATE DATETIME,
@INTERVAL INT,
@FLAG INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @TRACKDT DATETIME
DECLARE @I INT =1
DECLARE @V INT =1
DECLARE @COUNT INT
DECLARE @VCOUNT INT
DECLARE @STARTDATE DATETIME = ''
DECLARE @VEHID INT
DECLARE @TIMEDIFF INT
CREATE TABLE #TEMPVEHICLE
(
SNO INT IDENTITY(1,1),
VEHID INT
)
CREATE TABLE #TEMPLOG
(
SNO INT IDENTITY(1,1),
TRACKDT DATETIME
)
IF (@FLAG = 1)
BEGIN
INSERT INTO #TEMPVEHICLE (VEHID) SELECT VEHID FROM VEHICLEMASTER ORDER BY VEHID
SELECT @VCOUNT = COUNT(SNO) FROM #TEMPVEHICLE
WHILE (@V <= @VCOUNT)
BEGIN
SELECT @VEHID = VEHID FROM #TEMPVEHICLE WHERE SNO = @V
INSERT INTO #TEMPLOG(TRACKDT) SELECT TRACKDT
FROM TRACKINGLOG WITH(NOLOCK)
WHERE TRACKDT BETWEEN @FROMDATE AND @TODATE AND VEHID = @VEHID
ORDER BY TRACKDT ASC
SELECT @COUNT = COUNT(SNO) FROM #TEMPLOG
WHILE (@I <= @COUNT)
BEGIN
SELECT @TRACKDT=TRACKDT FROM #TEMPLOG WHERE SNO = @I
IF (@I = 1)
BEGIN
SELECT @STARTDATE = @TRACKDT
END
ELSE
BEGIN
SELECT @TIMEDIFF = DATEDIFF(SECOND,@STARTDATE,@TRACKDT)
IF @TIMEDIFF <= 20
BEGIN
DELETE FROM TRACKINGLOG WHERE TRACKDT = @TRACKDT AND VEHID = @VEHID
END
ELSE
BEGIN
SELECT @STARTDATE = @TRACKDT
END
END
SELECT @I = @I + 1
END
TRUNCATE TABLE #TEMPLOG
SELECT @V = @V + 1,@STARTDATE= '',@I=1
END
DROP TABLE #TEMPLOG
DROP TABLE #TEMPVEHICLE
END
END
如何我可以根据时间间隔写出删除数据的查询吗?应该快?
在此先感谢
请阅读有关Parameter sniffing。假设你的存储过程很慢但是正确,我已经更新了你的SP。请尝试:
ALTER PROCEDURE [dbo].[DELETEINTERVALDATA]
@FROMDATE1 DATETIME,
@TODATE1 DATETIME,
@INTERVAL1 INT,
@FLAG1 INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @TRACKDT DATETIME
DECLARE @I INT =1
DECLARE @V INT =1
DECLARE @COUNT INT
DECLARE @VCOUNT INT
DECLARE @STARTDATE DATETIME = ''
DECLARE @VEHID INT
DECLARE @TIMEDIFF INT
DECLARE @FROMDATE DATETIME = @FROMDATE1
DECLARE @TODATE DATETIME = @TODATE1
DECLARE @INTERVAL INT = @INTERVAL1
DECLARE @FLAG INT = @FLAG1
CREATE TABLE #TEMPVEHICLE
(
SNO INT IDENTITY(1,1),
VEHID INT
)
CREATE TABLE #TEMPLOG
(
SNO INT IDENTITY(1,1),
TRACKDT DATETIME
)
IF (@FLAG = 1)
BEGIN
INSERT INTO #TEMPVEHICLE (VEHID) SELECT VEHID FROM VEHICLEMASTER ORDER BY VEHID
SELECT @VCOUNT = COUNT(SNO) FROM #TEMPVEHICLE
WHILE (@V <= @VCOUNT)
BEGIN
SELECT @VEHID = VEHID FROM #TEMPVEHICLE WHERE SNO = @V
INSERT INTO #TEMPLOG(TRACKDT) SELECT TRACKDT
FROM TRACKINGLOG WITH(NOLOCK)
WHERE TRACKDT BETWEEN @FROMDATE AND @TODATE AND VEHID = @VEHID
ORDER BY TRACKDT ASC
SELECT @COUNT = COUNT(SNO) FROM #TEMPLOG
WHILE (@I <= @COUNT)
BEGIN
SELECT @TRACKDT=TRACKDT FROM #TEMPLOG WHERE SNO = @I
IF (@I = 1)
BEGIN
SELECT @STARTDATE = @TRACKDT
END
ELSE
BEGIN
SELECT @TIMEDIFF = DATEDIFF(SECOND,@STARTDATE,@TRACKDT)
IF @TIMEDIFF <= 20
BEGIN
DELETE FROM TRACKINGLOG WHERE TRACKDT = @TRACKDT AND VEHID = @VEHID
END
ELSE
BEGIN
SELECT @STARTDATE = @TRACKDT
END
END
SELECT @I = @I + 1
END
TRUNCATE TABLE #TEMPLOG
SELECT @V = @V + 1,@STARTDATE= '',@I=1
END
DROP TABLE #TEMPLOG
DROP TABLE #TEMPVEHICLE
END
END
感谢您的回应。但它仍然花费太多时间。我需要删除3000亿行。所以我认为需要几天时间才能删除,而我没有那么多时间 – user2928116
好吧。也许别人会建议更好的方法。我们需要从概念上将其划分为效率时间段。 –
参数嗅探不会使光标*任何*更快 –
如果在上一次之后的时间为20秒或更短时间,您想要删除一条记录。但是,如果之前的时间被删除,则比您想要与之前的时间进行比较等。 您需要某种类型的信息。递归表表达式或常规循环。
Declare @myTable table (vehid int, trackdt datetime)
insert into @mytable values
(1, '2017-05-20 00:00:30.000'),
(2, '2017-05-20 00:00:32.000'),
(2, '2017-05-20 00:00:42.000'),
(1, '2017-05-20 00:00:40.000'),
(2, '2017-05-20 00:00:52.000'),
(1, '2017-05-20 00:00:50.000'),
(1, '2017-05-20 00:01:00.000'),
(2, '2017-05-20 00:01:02.000'),
(1, '2017-05-20 00:01:10.000'),
(1, '2017-05-20 00:01:20.000'),
(2, '2017-05-20 00:01:12.000'),
(1, '2017-05-20 00:01:30.000'),
(2, '2017-05-20 00:01:22.000'),
(2, '2017-05-20 00:01:32.000')
--set @@rowcount to 1
select 1
while @@ROWCOUNT > 0
begin
DELETE T1
from @mytable t1
--previous time
outer apply (select top 1 trackdt from @mytable where vehid = t1.vehid and trackdt < t1.trackdt order by 1 desc)t2
--previous time before that
outer apply (select top 1 trackdt from @mytable where vehid = t1.vehid and trackdt < t2.trackdt order by 1 desc)t3
--previous time was less or equal to 20 seconds
where DATEDIFF(second,t2.trackdt,t1.trackdt)<=20
--previous time before that was more than 20 seconds or there is no time before
and (DATEDIFF(second,t3.trackdt,t2.trackdt)>20 or t3.trackdt is null)
end
select * from @mytable
感谢您的回应Peter。你的查询似乎有点快,但它没有给我预期的输出。如果之前的时间(开始时间)和当前时间的差异小于20秒,那么我想删除当前时间,那么我将检查开始时间和新的当前时间之间的差异,如果差异小于20秒,则相同的当前时间将被删除如果大于20秒,我的当前时间将成为开始时间,然后我想检查开始时间和下一个当前时间的差异等等。因此,以前的时间将永远不会被删除。 – user2928116
欲了解更多信息,你可以看到我的商店程序 – user2928116
@ user2928116它似乎是适用于您的示例数据。你可以举一个例子,这个查询不能正常工作吗? – Peter
所需输出对应于秒间隔。
您可以使用窗口函数检索结果集中的先前值。例如,LAG(trackdt,1)将返回以前的值。接下来是LEAD
。 FIRST_VALUE
将返回一组中的第一个值。
查询:
select * ,
FIRST_VALUE(trackdt) over (partition by vehid order by trackdt) as t0
from @mytable
将返回每辆车第一trackdt
值,当行被trackdt
排序。
1 2017-05-20 00:00:30.000 2017-05-20 00:00:30.000
1 2017-05-20 00:00:40.000 2017-05-20 00:00:30.000
1 2017-05-20 00:00:50.000 2017-05-20 00:00:30.000
这样我们就可以用datediff
计算在分区中的当前和第一值之间的间隔。将所得的查询是有点难看:
select * ,
FIRST_VALUE(trackdt) over (partition by vehid order by trackdt) as t0,
datediff(s,FIRST_VALUE(trackdt) over (partition by vehid order by trackdt),
trackdt) as interval
from @mytable
30除以将给予我们其中的每一行所属的间隔桶。
1 2017-05-20 00:01:10.000 2017-05-20 00:00:30.000 1
1 2017-05-20 00:01:20.000 2017-05-20 00:00:30.000 1
1 2017-05-20 00:01:30.000 2017-05-20 00:00:30.000 2
2 2017-05-20 00:00:32.000 2017-05-20 00:00:32.000 0
2 2017-05-20 00:00:42.000 2017-05-20 00:00:32.000 0
计算其余不过,将第一行中,每30秒桶返回0:
select * ,
FIRST_VALUE(trackdt) over (partition by vehid order by trackdt) as t0,
datediff(s,FIRST_VALUE(trackdt) over (partition by vehid order by trackdt),
trackdt) %30 as remainder
from @mytable
1 2017-05-20 00:01:00.000 2017-05-20 00:00:30.000 0
1 2017-05-20 00:01:10.000 2017-05-20 00:00:30.000 10
1 2017-05-20 00:01:20.000 2017-05-20 00:00:30.000 20
1 2017-05-20 00:01:30.000 2017-05-20 00:00:30.000 0
2 2017-05-20 00:00:32.000 2017-05-20 00:00:32.000 0
我们可以用一个或一个以上的CTE整齐向上此查询并执行删除:
with start_times as
(
select * ,
FIRST_VALUE(trackdt) over (partition by vehid order by trackdt) as t0
from @mytable
),
intervals as
(
select * ,
datediff(s,t0,trackdt) %30 as rem
from start_times
)
delete
from intervals
where rem<>0
该查询依赖于每隔30秒有一个值。
通常,您可以使用ROW_NUMBER()
函数来标识间隔存储桶中的记录,并只需选择存储桶中的第一行。既然我们要保留第一行,我们挑不出什么,有一个ROW_NUMBER> 1:
with start_times as
(
select * ,
FIRST_VALUE(trackdt) over (partition by vehid order by trackdt) as t0
from #mytable
),
intervals as
(
select * ,
datediff(s,t0,trackdt) /30 as interval
from start_times
),
ordered as
(
select *,
ROW_NUMBER() over(partition by vehid,interval order by trackdt) row_num
from intervals
)
select vehid,trackdt
from ordered
where row_num>1
即使此查询速度快,我不会用它在3万亿行的表。窗口化常常导致后台处理 - 临时结果存储在tempdb中以允许窗口计算。这将是更好的选择必须删除的行的ID,将它们插入到一个临时表,然后使用具有联接子句的DELETE:
DELETE HugeTable
From HugeTable
INNER JOIN TempTable on TempTable.ID=HugeTable.ID
即使是这样,你可能想批次删除。在这种情况下,你可以使用NTILE函数计算1和N之间的每一行批号:
select ID,NTILE(100) over(order by vehid,trackdt) as batch_number
from ordered
where row_num=1
这将计算1和100之间的批号可以将此存储在临时表中删除一次一行。
Trackdt不会有30秒的差异,它可能是5秒到60秒之间的任何东西。 – user2928116
TIL'NTILE'。 +1 – dd4711
如果我理解正确,您希望保留每辆'vehid'的最小和最大'trackdt'?另外,什么是确切的表定义,包括索引? –
不,不,我已经更新了我的问题..我想要每辆车的数据至少20秒的时间间隔。表中有ID列,这是PK,所以你可以把它算作我的猜测。 – user2928116
@ user2928116:尝试我的查询。我希望你的SP是正确的。 –