Hi all,
I'm looking into the management of duplicate records in MySQL, but this case is a little bit more specific. A record is only duplicated if column A and column B have the same value. So a record is only a duplicate if two specific columns have the same values in more than one record.
Table (excerpt):
| id | userid | courseid |
| 1 | 23 | 3 |
| 2 | 23 | 3 |
| 3 | 28 | 5 |
| 4 | 28 | 5 |
| 5 | 23 | 5 |
| 6 | 23 | 3 |
| 7 | 29 | 3 |
Duplicate rows above that would be deleted by the query = id's 2, 4 and 6
(as I would like to keep one of the duplicate records, in this case, the record with the lowest id value.
After playing around and doing a little bit of research, this seemed to work a treat:
ALTER IGNORE TABLE table ADD UNIQUE INDEX tmpindex (userid,courseid);
ALTER TABLE table DROP INDEX tmpindex;
My only concern is that this probably isn't the fastest or safest method of achieving the same result, this query could be executed dozens of times an hour or thousands of times by a rouge user (I'll add a control/restriction eventually). Does anyone have any feedback or suggestions? I would truly appreciate a bit more wisdom.
Another thought, would it be more sensible to check for an existing record before inserting duplicates in the first place? There could be 100's of inserts from one submission so a foreach
to check if a record exists before inserting it just wouldn't be sensible imo - but I'm no DBA guru...
Thanks in advance :)
Michael