SQL删除表中的重复项

SQL删除表中的重复项

问题描述:

我有一个表重复的表事务。我想保留具有最小id的记录并根据四个字段DATE,AMOUNT,REFNUMBER,PARENTFOLDERID删除所有重复项。我写了这个查询,但我不确定这是否可以用有效的方式编写。你认为有更好的方法吗?我在问,因为我担心运行时间。SQL删除表中的重复项

DELETE FROM TRANSACTION 
WHERE ID IN 
(SELECT FIT2.ID 
FROM 
(SELECT MIN(ID) AS ID, FIT.DATE, FIT.AMOUNT, FIT.REFNUMBER, FIT.PARENTFOLDERID 
FROM EWORK.TRANSACTION FIT 
GROUP BY FIT.DATE, FIT.AMOUNT , FIT.REFNUMBER, FIT.PARENTFOLDERID 
HAVING COUNT(1)>1 and FIT.AMOUNT >0) FIT1, 
EWORK.TRANSACTION FIT2 

WHERE FIT1.DATE=FIT2.DATE AND 
FIT1.AMOUNT=FIT2.AMOUNT AND 
FIT1.REFNUMBER=FIT2.REFNUMBER AND 
FIT1.PARENTFOLDERID=FIT2.PARENTFOLDERID AND 
FIT1.ID<>FIT2.ID) 

它可能是更有效地做类似

DELETE FROM transaction t1 
WHERE EXISTS(SELECT 1 
       FROM transaction t2 
       WHERE t1.date = t2.date 
        AND t1.refnumber = t2.refnumber 
        AND t1.parentFolderId = t2.parentFolderId 
        AND t2.id > t1.id) 
+0

是这样可行。我不确定表现如何,但看起来更清晰。 – mahen 2012-04-04 19:24:01

+0

@justin:我认为使用分析函数更加优化,你说什么? – 2012-04-04 20:25:28

+0

@GauravSoni - 我不希望它在这种情况下效率更高。无论是我的方法还是你的方法,Oracle都将不得不两次击中“交易”表。我倾向于期望反连接会比分析函数更有效率。但它将取决于可用的索引,数据,有多少重复行等。如果在某些情况下,分析函数方法效率更高但我期望它相当接近,我不会感到震惊。 – 2012-04-04 20:30:30

我会尝试这样的事:

DELETE transaction 
FROM transaction 
LEFT OUTER JOIN 
    (
     SELECT MIN(id) as id, date, amount, refnumber, parentfolderid 
     FROM transaction 
     GROUP BY date, amount, refnumber, parentfolderid 
    ) as validRows 
ON transaction.id = validRows.id 
WHERE validRows.id IS NULL 

DELETE FROM transaction 
     WHERE ID IN (
       SELECT ID 
       FROM (SELECT ID, 
          ROW_NUMBER() OVER (PARTITION BY date 
                  ,amount 
                  ,refnumber 
                  ,parentfolderid 
               ORDER BY ID) rn 
               FROM transaction) 
       WHERE rn <> 1); 

我会尝试这样

+0

@mahen:如果表格真的很大,使用Google Analytics的宝贵功能,您可以优化 – 2012-04-04 19:44:28

+0

我相信您想在分析函数中使用“ORDER BY ID”。 – 2012-04-04 20:30:56

+0

@JustinCave:关于你的建议,我编辑了我的初始查询。谢谢你的评论justin – 2012-04-04 20:44:41