Were you ever in need of something like, deleting duplicate rows when the table doesn’t have a primary key. If yes, read on!
This article gives a quick way of deleting duplicate records from a given table, that doesn’t have a primary key. Also, we will NOT use cursors, loops, temporary tables (global or local) but still accomplish the mission. Interesting! isn't it. Please read on!
Create Table DeleteDupsTable(BrandId int, SKU int)
Lets add a few records to the above table: A screen shot of the all the work can be found at the end of this article. Also, the SQL Queries are uploaded and is available along with this article.
Insert into DeleteDupsTable Values(1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (1000,Cast(Rand ()*100 as int)), (2000,Cast(Rand ()*100 as int)), (2000,Cast(Rand ()*100 as int)), (2000,Cast(Rand ()*100 as int)),
(2000,Cast(Rand ()*100 as int)), (2000,Cast(Rand ()*100 as int)), (3000,Cast(Rand ()*100 as int)), (3000,Cast(Rand ()*100 as int)), (3000,Cast(Rand ()*100 as int)), (3000,Cast(Rand ()*100 as int))
Notice, that we don’t need to write the insert statement for all the rows. It’s an enhancement in SQL Server 2008 aka table value constructor.
Go ahead run the above SQL.
At this point in time we have a heap created in the database. A table without any keys is called a ‘Heap’.
Lets run the query below to see what have we got to ensure we have necessary data before we go ahead and purge the duplicate records.
Select * from DeleteDupsTable
Ok necessary setup is done.
Lets get into the details of how we delete duplicate duplicate records.
Lets verify how many records have we got per Brand. Go ahead and run the below query. This would list number of records per Brand.
Select Brandid,COUNT(BrandId) from DeleteDupsTable group by Brandid
Ok, we have all the data ready out for purging.
Now the most important step. I wonder if it’s ever possible to delete duplicate records without a primary key! :). Well, even I don’t have any idea on how to I proceed to meet this requirement.
However, am posting a work around. This may not be the best possible solution, But, helps us to meet an immediate requirement short term!
Ok no more blah blah!. Lets get into the details!
Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable
Where BrandId='+ CAST(BrandId As Varchar)
From
DeleteDupsTable Group by
Brandid having COUNT(BrandId)>1
After executing the above statement, the output will be a set of delete statements. Each delete statement would delete all records excluding the latest one for every brand that has got more than one record.
Copy the output and paste it on the a new query window and press F5.
Boooom! All Records deleted. Mission accomplished.
As part of testing, I deleted my critical data. How would I get it back or at least would like to show what records were purged.
Well! Good question.
Lets enhance the above query. The enhancement is, the primary requirement stats as it is which is nothing but the dynamic delete queries will delete the records. But, at the same time it outputs the records that were purged. with no extra select statements (interesting! is it?). Meaning, you see the data on your screen that was deleted. This way, even if you have lost the data, you could still insert it back using the output just generated.
Go ahead and run the below query and notice the output this query generates:
Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from
DeleteDupsTable output Deleted.* Where BrandId='+ CAST(BrandId As Varchar)
From
DeleteDupsTable Group
by Brandid having
COUNT(BrandId)>1
In the above query, notice we have output Deleted.* clause. This clause does the trick of displaying the records that were deleted.
Notice, in the following output, each delete statement tries to delete all records excluding one for every brand. The record being retained is the latest record for each of the brands.
Output:
Hope that helps! And happy programming :)
Download
Download source code for Deleting Duplicate Rows when the table doesn’t have a primary key.
by Deeraj
tnx laura
ReplyDelete