python - repeated - Parte 2 de un resultado exitoso con respecto al llenado del espacio en blanco

python remove duplicates keep order (3)

Aunque no está en Python, este tipo de edición es bastante directo si usas sed

sed ''N;s/.*/n/(END_POLY/)//1/'' file.txt

Básicamente lo que hace es que usa N para leer 2 líneas a la vez, si la segunda línea contiene la cadena END_POLY , elimina la primera línea, dejando solo END_POLY

Entonces, mi primera pregunta fue respondida correctamente. Para referencia, puede ir aquí ...

¿Cómo llenar el espacio en blanco con información sin modificar el resto?

En resumen, necesitaba esto ...

POLYGON_POINT -79.750000000217,42.017498354525,0 POLYGON_POINT -79.750000000217,42.016478251402,0 POLYGON_POINT -79.750598748133,42.017193264943,0 POLYGON_POINT -79.750000000217,42.017498354525,0 POLYGON_POINT -79.750000000217,42.085882815878,0 POLYGON_POINT -79.750000000217,42.082008734634,0 POLYGON_POINT -79.751045507507,42.082126409633,0 POLYGON_POINT -79.750281907508,42.083166574215,0 POLYGON_POINT -79.750781149174,42.084212672130,0 POLYGON_POINT -79.750000000217,42.085882815878,0

Para convertirse en esto ...

BEGIN_POLYGON POLYGON_POINT -79.750000000217,42.017498354525,0 POLYGON_POINT -79.750000000217,42.016478251402,0 POLYGON_POINT -79.750598748133,42.017193264943,0 POLYGON_POINT -79.750000000217,42.017498354525,0 END_POLY BEGIN_POLYGON POLYGON_POINT -79.750000000217,42.085882815878,0 POLYGON_POINT -79.750000000217,42.082008734634,0 POLYGON_POINT -79.751045507507,42.082126409633,0 POLYGON_POINT -79.750281907508,42.083166574215,0 POLYGON_POINT -79.750781149174,42.084212672130,0 POLYGON_POINT -79.750000000217,42.085882815878,0 END_POLY

Que se logró con éxito con un script de Python. Ahora he descubierto que necesito eliminar líneas duplicadas, específicamente la última línea de cada bloque. Esa línea cierra el polígono pero el lote de construcción da un error porque cierra el polígono por sí mismo. Básicamente, necesito que sea esto al final de todo ...

BEGIN_POLYGON POLYGON_POINT -79.750000000217,42.017498354525,0 POLYGON_POINT -79.750000000217,42.016478251402,0 POLYGON_POINT -79.750598748133,42.017193264943,0 END_POLY BEGIN_POLYGON POLYGON_POINT -79.750000000217,42.085882815878,0 POLYGON_POINT -79.750000000217,42.082008734634,0 POLYGON_POINT -79.751045507507,42.082126409633,0 POLYGON_POINT -79.750281907508,42.083166574215,0 POLYGON_POINT -79.750781149174,42.084212672130,0 END_POLY

y hay 3,415,978 líneas por recorrer. Cada otro removedor de duplicados elimina el espacio en blanco y toda la redacción. Hmmm

Como se señaló en los comentarios, mantenga una referencia a la línea anterior:

with open(''in.txt'') as fin, open(''out.txt'', ''w'') as fout: prev = None for i, line in enumerate(fin): if line.strip() != ''END_POLY'' and prev: fout.write(prev) prev = line if not i % 10000: print(''Processing line {}''.format(i)) fout.write(line)

si no quiere datos duplicados, puede transformar la lista en un conjunto, luego en una lista (tomando el código @ Jean-François Fabre de la otra pregunta un poco modificado):

import itertools, collections with open("file.txt") as f, open("fileout.txt","w") as fw: fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGON/n"]+list(collections.OrderedDict.fromkeys(v).keys())+["END_POLYGON/n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))

como puedes ver, si lo haces:

print(list(collections.OrderedDict.fromkeys([1,1,1,1,1,1,2,2,2,2,5,3,3,3,3,3]).keys()))

será -> [1, 2, 5, 3] y mantendrá el orden