Powered by Blogger.

Friday, July 30, 2010

Deleting Rows Duplicate when the table doesn’t have a primary key

Were you ever in need of something like, deleting duplicate rows when the table doesn’t have a primary key. If yes, read on!

This article gives a quick way of deleting duplicate records from a given table, that doesn’t have a primary key. Also, we will NOT use cursors, loops, temporary tables (global or local) but still accomplish the mission. Interesting! isn't it. Please read on!

First lets create a simple table with two columns and name the table as ‘DeleteDupsTable’.


Create Table DeleteDupsTable(BrandId int, SKU int)

Lets add a few records to the above table: A screen shot of the all the work can be found at the end of this article. Also, the SQL Queries are uploaded and is available along with this article.

Insert into DeleteDupsTable Values(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(1000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),
(2000,Cast(Rand()*100 as int)),





(2000,Cast(Rand()*100 as int)), (2000,Cast(Rand()*100 as int)), (3000,Cast(Rand()*100 as int)), (3000,Cast(Rand()*100 as int)), (3000,Cast(Rand()*100 as int)), (3000,Cast(Rand()*100 as int))
Notice, that we don’t need to write the insert statement for all the rows. It’s an enhancement in SQL Server 2008 aka table value constructor.

Go ahead run the above SQL.

At this point in time we have a heap created in the database. A table without any keys is called a ‘Heap’.

Lets run the query below to see what have we got to ensure we have necessary data before we go ahead and purge the duplicate records.

Select * from DeleteDupsTable

Ok necessary setup is done.

Lets get into the details of how we delete duplicate duplicate records.

Lets verify how many records have we got per Brand. Go ahead and run the below query. This would list number of records per Brand.

Select Brandid,COUNT(BrandId) from DeleteDupsTable group by Brandid

Ok, we have all the data ready out for purging.
Now the most important step. I wonder if it’s ever possible to delete duplicate records without a primary key! :). Well, even I don’t have any idea on how to I proceed to meet this requirement.

However, am posting a work around. This may not be the best possible solution, But, helps us to meet an immediate requirement short term!

Ok no more blah blah!. Lets get into the details!

The following query generates a dynamic ‘DELETE’ statement for every brand in the table. The criteria is the BrandId should have more than one record (possible qualifier for duplicate deletes :) ). The delete statement would retain the latest record (the last record in the list will be the latest one when we issue a select * from DeleteDupsTable).


Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from DeleteDupsTable
Where BrandId='+ CAST(BrandId As Varchar)
From
DeleteDupsTable Group by
Brandid having COUNT(BrandId)>1
After executing the above statement, the output will be a set of delete statements. Each delete statement would delete all records excluding the latest one for every brand that has got more than one record.

Copy the output and paste it on the a new query window and press F5.

Boooom! All Records deleted. Mission accomplished.

As part of testing, I deleted my critical data. How would I get it back or at least would like to show what records were purged.

Well! Good question.

Lets enhance the above query. The enhancement is, the primary requirement stats as it is  which is nothing but the dynamic delete queries will delete the records. But, at the same time it outputs the records that were purged. with no extra select statements (interesting! is it?). Meaning, you see the data on your screen that was deleted. This way, even if you have lost the data, you could still insert it back using the output just generated.

Go ahead and run the below query and notice the output this query generates:









Select 'Delete Top(' + Cast((COUNT(BrandId)-1) as VarChar) + ') from

DeleteDupsTable output Deleted.* Where BrandId='+ CAST(BrandId As Varchar)

From

DeleteDupsTable  Group

by Brandid having

COUNT(BrandId)>1

In the above query, notice we have output Deleted.* clause. This clause does the trick of displaying the records that were deleted.

Notice, in the following output, each delete statement tries to delete all records excluding one for every brand. The record being retained is the latest record for each of the brands.


Output:
Output




Hope that helps! And happy programming :)

I will get back to you with another interesting article very soon!

Download



 Download source code for Deleting Duplicate Rows when the table doesn’t have a primary key.






by Deeraj

1 comment:

  ©Template by Dicas Blogger.

TOPO