I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?
mike M 60
mike M 60
mike M 60
I need only one record
mike M 60
I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?
mike M 60
mike M 60
mike M 60
I need only one record
mike M 60
Hello michael123
I am afraid, there is no direct way to get rid of duplicate rows.
But you can populate a new table with distinct rows from your given table only, for example:
-- create your table
create table dudel(name varchar(50)not null, sex char(1), age integer);
-- inserte some rows into dudel
insert into dudel (name, sex, age) values ('Randy', 'M', 66);
-- . . .
commit;
-- Show all rows of dudel
select name, sex, age from dudel;
/*
name,sex,age
----------------
'mike','M',60
'mike','M',60
'mike','M',60
'Randy','M',66
'Randy','M',66
'Randy','M',66
'Randy','M',66
*/
-- show only distinct rows
select name, sex, age from dudel group by name, sex, age;
/*
name,sex,age
------------
'mike','M',60
'Randy','M',66
*/
-- create a new table dudel2 (clone of dudel)
create table dudel2(name varchar(50)not null, sex char(1), age integer);
-- populate dudel2 from dudel leaving out all duplicates
insert into dudel2 select name, sex, age from dudel group by name, sex, age;
-- show the result
select name, sex, age from dudel2;
/*
name,sex,age
'mike','M',60
'Randy','M',66
*/
Now you should be able to replace old table by new one containing distinct rows only.
You have to consider foreign keys carefully!
To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.
krs,
tesu
What version of MSSQL are you using? If 2005, then I have a query you can modify that will remove duplicate entries.
I use MSSQL 2000 version.
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....
on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?
In
INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.
Instead of GROUP BY clause you can also use DISTINCT:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;
Both insert statements are equivalent but yours is simpler :)
We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.
krs,
tesu
The script is on a different computer. I'll post it when I get home.
Sorry I didn't get this uploaded the other night, the kids have been sick.
Suppose you have a table with the following structure:
id - int PrimaryKey
name - varchar(50)
description - varchar(MAX)
and the following values:
1 Joe Short for Joseph
2 Dave Short for David
3 Joe Short for Joseph
4 Joe Short for Joseph
5 Chris Short for Christian
6 Rob Short for Robert
Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.
MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:
WITH names AS
(
SELECT
id,
name,
description,
ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
table_1
)
SELECT
id,
name,
description,
rowNum
FROM
names
We get these results:
id name description rowNum
5 Chris Short for Christian 1
2 Dave Short for David 1
1 Joe Short for Joseph 1
3 Joe Short for Joseph 2
4 Joe Short for Joseph 3
6 Rob Short for Robert 1
Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:
SELECT
id,
name,
description
FROM
names
WHERE
rowNum > 1
which gives us:
3 Joe Short for Joseph 2
4 Joe Short for Joseph 3
Cool isn't it?
What this allows us to do, is dump this into a temporary table to use as in a subquery...
DECLARE @tblNames TABLE(id int);
WITH names AS
(
SELECT
id,
name,
description,
ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
table_1
)
INSERT
@tblNames
SELECT
id
FROM
names
WHERE
rowNum > 1
DELETE
FROM
table_1
WHERE
id IN
(
SELECT
id
FROM
@tblNames
)
Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:
1 Joe Short for Joseph
2 Dave Short for David
5 Chris Short for Christian
6 Rob Short for Robert
There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.
Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.