same - Tabla de limpieza MySQL de las entradas duplicadas Y volver a vincular FK en la tabla dependiente

mysql delete rows with same value (1)

Aquí está mi situación: tengo 2 tablas, patient y study .

Cada tabla tiene su propio PK que usa autoincrement.

En mi caso, el pat_id debe ser único. No se declara como único en el nivel de la base de datos, ya que podría ser no exclusivo en algunos usos (no es un sistema hecho en casa). Descubrí cómo configurar el sistema para considerar el pat_id como único, pero ahora necesito limpiar la base de datos para pacientes duplicados Y volver a vincular pacientes duplicados en la mesa de estudio al paciente único restante , antes de eliminar a los pacientes duplicados.

Tabla de Patient :

CREATE TABLE `patient` ( `pk` BIGINT(20) NOT NULL AUTO_INCREMENT, `pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL, ... `pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL, ... `pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL .... PRIMARY KEY (`pk`) )ENGINE=InnoDB;

Mesa de estudio:

CREATE TABLE `study` ( `pk` BIGINT(20) NOT NULL AUTO_INCREMENT, `patient_fk` BIGINT(20) DEFAULT NULL, ... PRIMARY KEY (`pk`), ... CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`) )ENGINE=InnoDB;

Encontré algunas preguntas similares, pero no exactamente el mismo problema, especialmente faltaba el enlace de las claves externas al paciente único restante.

Actualización de limpieza para entradas duplicadas

Actualice solo el primer registro de entradas duplicadas en MySQL

Así es como lo hice.

Reutilicé un campo no utilizado en patient tabla del patient para marcar patient no duplicados (N), primero de pacientes duplicados (X) y otros pacientes duplicados (Y). También podría agregar una columna para esto (y soltarlo después de su uso).

Estos son los pasos que seguí para limpiar mi base de datos:

/*1: List duplicated */ select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1 from patient t where pat_id in ( select pat_id from ( select pat_id, count(*) from patient group by 1 having count(*)>1 ) xxx); /*2: Delete orphan patients */ delete from patient where pk not in (select patient_fk from study); /*3: Reset flag for duplicated (or not) patients*/ update patient t set t.`pat_custom1`=''N''; /*4: Mark all duplicated */ update patient t set t.`pat_custom1`=''Y'' where pat_id in ( select pat_id from ( select pat_id, count(*) from patient group by 1 having count(*)>1 ) xxx) ; /*5: Unmark the 1st of the duplicated*/ update patient t join (select pk from ( select min(pk) as pk, pat_id from patient where pat_custom1=''Y'' group by pat_id ) xxx ) x on (x.pk=t.pk) set t.`pat_custom1`=''X'' where pat_custom1=''Y'' ; /*6: Verify update is correct*/ select pk, pat_id,pat_custom1 from `patient` where pat_custom1!=''N'' order by pat_id, pat_custom1; /*7: Verify studies linked to duplicated patient */ select p.* from study s join patient p on (p.pk=s.patient_fk) where p.pat_custom1=''Y''; /*8: Relink duplicated patients */ update study s join patient p on (p.pk=s.patient_fk) set patient_fk = (select pk from patient pp where pp.pat_id=p.pat_id and pp.pat_custom1=''X'') where p.pat_custom1=''Y''; /*9: Delete newly orphan patients */ delete from patient where pk not in (select patient_fk from study); /* 10: reset flag */ update patient t set t.`pat_custom1`=null; /* 11: Commit changes */ commit;

Ciertamente hay una manera más corta, con un SQL más inteligente (¿complicado?), Pero personalmente prefiero la manera más simple. Esto también me permite comprobar que cada paso está haciendo lo que espero.