traer solo repetidos registros mostrar eliminar duplicados dejando contar campos buscar agrupar sql sql-server duplicate-data

sql - solo - Eliminar duplicados con advertencias



sql eliminar registros repetidos dejando solo 1 (6)

Tengo una tabla con rowID, longitud, latitud, businessName, url, título. Esto podría verse así:

rowID | long | lat | businessName | url | caption 1 20 -20 Pizza Hut yum.com null

¿Cómo elimino todos los duplicados, pero solo guardo el que tiene una URL (primera prioridad), o guardo el que tiene un título si el otro no tiene una URL (segunda prioridad) y elimino el resto?


Esta es mi técnica de bucle. Esto probablemente será rechazado por no ser convencional, y estoy de acuerdo con eso.

DECLARE @LoopVar int DECLARE @long int, @lat int, @businessname varchar(30), @winner int SET @LoopVar = (SELECT MIN(rowID) FROM Locations) WHILE @LoopVar is not null BEGIN --initialize the variables. SELECT @long = null, @lat = null, @businessname = null, @winner = null -- load data from the known good row. SELECT @long = long, @lat = lat, @businessname = businessname FROM Locations WHERE rowID = @LoopVar --find the winning row with that data SELECT top 1 @Winner = rowID FROM Locations WHERE @long = long AND @lat = lat AND @businessname = businessname ORDER BY CASE WHEN URL is not null THEN 1 ELSE 2 END, CASE WHEN Caption is not null THEN 1 ELSE 2 END, RowId --delete any losers. DELETE FROM Locations WHERE @long = long AND @lat = lat AND @businessname = businessname AND @winner != rowID -- prep the next loop value. SET @LoopVar = (SELECT MIN(rowID) FROM Locations WHERE @LoopVar < rowID) END


Si es posible, ¿puede homogeneizar y luego eliminar duplicados?

Paso 1:

UPDATE BusinessLocations SET BusinessLocations.url = LocationsWithUrl.url FROM BusinessLocations INNER JOIN ( SELECT long, lat, businessName, url, caption FROM BusinessLocations WHERE url IS NOT NULL) LocationsWithUrl ON BusinessLocations.long = LocationsWithUrl.long AND BusinessLocations.lat = LocationsWithUrl.lat AND BusinessLocations.businessName = LocationsWithUrl.businessName UPDATE BusinessLocations SET BusinessLocations.caption = LocationsWithCaption.caption FROM BusinessLocations INNER JOIN ( SELECT long, lat, businessName, url, caption FROM BusinessLocations WHERE caption IS NOT NULL) LocationsWithCaption ON BusinessLocations.long = LocationsWithCaption.long AND BusinessLocations.lat = LocationsWithCaption.lat AND BusinessLocations.businessName = LocationsWithCaption.businessName

Paso 2: Eliminar duplicados.


Solución basada en conjuntos:

delete from T as t1 where /* delete if there is a "better" row with same long, lat and businessName */ exists( select * from T as t2 where t1.rowID <> t2.rowID and t1.long = t2.long and t1.lat = t2.lat and t1.businessName = t2.businessName and case when t1.url is null then 0 else 4 end /* 4 points for non-null url */ + case when t1.businessName is null then 0 else 2 end /* 2 points for non-null businessName */ + case when t1.rowID > t2.rowId then 0 else 1 end /* 1 point for having smaller rowId */ < case when t2.url is null then 0 else 4 end + case when t2.businessName is null then 0 else 2 end )


delete MyTable from MyTable left outer join ( select min(rowID) as rowID, long, lat, businessName from MyTable where url is not null group by long, lat, businessName ) as HasUrl on MyTable.long = HasUrl.long and MyTable.lat = HasUrl.lat and MyTable.businessName = HasUrl.businessName left outer join ( select min(rowID) as rowID, long, lat, businessName from MyTable where caption is not null group by long, lat, businessName ) HasCaption on MyTable.long = HasCaption.long and MyTable.lat = HasCaption.lat and MyTable.businessName = HasCaption.businessName left outer join ( select min(rowID) as rowID, long, lat, businessName from MyTable where url is null and caption is null group by long, lat, businessName ) HasNone on MyTable.long = HasNone.long and MyTable.lat = HasNone.lat and MyTable.businessName = HasNone.businessName where MyTable.rowID <> coalesce(HasUrl.rowID, HasCaption.rowID, HasNone.rowID)


Esta solución es traída a usted por "cosas que aprendí en " en la última semana:

DELETE restaurant WHERE rowID in (SELECT rowID FROM restaurant EXCEPT SELECT rowID FROM ( SELECT rowID, Rank() over (Partition BY BusinessName, lat, long ORDER BY url DESC, caption DESC ) AS Rank FROM restaurant ) rs WHERE Rank = 1)

Advertencia: no he probado esto en una base de datos real


Similar a otra respuesta, pero desea eliminar en función del número de fila en lugar del rango. Mezcle con expresiones de tabla comunes también:

;WITH GroupedRows AS ( SELECT rowID, Row_Number() OVER (Partition BY BusinessName, lat, long ORDER BY url DESC, caption DESC) rowNum FROM restaurant ) DELETE r FROM restaurant r JOIN GroupedRows gr ON r.rowID = gr.rowID WHERE gr.rowNum > 1