如何删除表中的重复记录?

问题描述:

我在测试数据库中有一张表,表明某人在运行INSERT脚本进行设置时显然有点触发 - 很开心。模式如下所示:如何删除表中的重复记录?

ID UNIQUEIDENTIFIER 
TYPE_INT SMALLINT 
SYSTEM_VALUE SMALLINT 
NAME VARCHAR 
MAPPED_VALUE VARCHAR 

它应该有几十行。它大约有20万个,其中大部分都是重复的,其中TYPE_INT,SYSTEM_VALUE,NAME和MAPPED_VALUE全都相同,ID不相同。

现在,我大概可以创建一个脚本来清理这个在内存中创建临时表的脚本,使用INSERT .. SELECT DISTINCT来获取原始表的所有唯一值,然后将所有内容都复制回来。但有没有一种更简单的方法来做到这一点,比如DELETE查询在WHERE子句中有什么特别之处?

你不给你的表名,但我认为这样的事情应该工作。只是离开恰好具有最低ID的记录。你可能想先用ROLLBACK进行测试!

BEGIN TRAN 
DELETE <table_name> 
FROM <table_name> T1 
WHERE EXISTS(
SELECT * FROM <table_name> T2 
WHERE  
T1.TYPE_INT = T2.TYPE_INT AND 
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND 
T1.NAME = T2.NAME AND 
T1.MAPPED_VALUE = T2.MAPPED_VALUE AND 
T2.ID > T1.ID 
) 

SELECT * FROM <table_name> 

ROLLBACK 

WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE) 
AS 
(
SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE 
FROM T1 
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE 
HAVING Count(Id) > 1 
) 
DELETE FROM T1 
WHERE ID IN (
SELECT T1.Id 
FROM T1 
INNER JOIN Duplicates 
ON T1.TYPE_INT = Duplicates.TYPE_INT 
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE 
AND T1.NAME = Duplicates.NAME 
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE 
AND T1.Id <> Duplicates.ID 
) 

这里是一个伟大的文章:Deleting duplicates,基本上使用此模式:

WITH q AS 
     (
     SELECT d.*, 
       ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn 
     FROM t_duplicate d 
     ) 
DELETE 
FROM q 
WHERE rn > 1 

SELECT * 
FROM t_duplicate