How to remove hard returns and clean duplicate records from MySQL/MariaDB database

It happens from time to time that we have duplicate queries in our database Unique index type tables that are caused by difficult to detect carriage returns. The following method cleans up the database by first removing the hard returns, and then by removing the duplicates.

Let me illustrate this on an example. Let’s say we have a table called: ‘slovak’ and a column with a unique index named ‘word’. This table holds the list of Slovak words, but unfortunately, some of them are entered with a hard return and thus when I run the following query, I see the duplicates, which unique index simply cannot prevent:

To fix the issue, we’ll first need to remove the ‘unique’ index on our ‘word’ column. The easiest way is to just drop the old index, something like this::

ALTER TABLE `slovak`
	DROP INDEX `word`;

Then we can run the following query which removes all carriage returns from the database:

UPDATE slovak SET word = REPLACE(REPLACE(word, '\r', ''), '\n', '');

However, as you can imagine, while we have successfully removed carriage returns, now we have the DB full of duplicate entries, which are actual duplicates, so switching ‘word’ column back to the ‘unique’ index would produce errors with reference to those duplicate entries.

So before we do so, let’s remove all of the duplicate entries inside the ‘word’ column and keeping only the entry with the lowest id number.

This can be done by using the following SQL query:

DELETE FROM slovak
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.word) x)

Now, we should get the response that says something like this:

'/* Affected rows: x Found rows: 0 Warnings: 0 Duration for 1 query: 00:xx:xx */'

Once this is done, we should be able to set the ‘unique’ index back onto ‘word’ column, which should be as simple as running the following query:

ALTER TABLE `slovak`
	DROP INDEX `word`,
	ADD UNIQUE INDEX `word` (`word`);

And voila, we have cleaned our DB of those hard to detect duplicates caused by appended hard returns.

Now, if I run the same query as in the beginning, I no longer see the duplicate words:

Enjoy!