index - python list to string
Python, calcula la diferencia de lista (12)
En Python, ¿cuál es la mejor forma de calcular la diferencia entre dos listas?
ejemplo
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
Al echar un vistazo a TimeComplexity of In-operator, en el peor de los casos funciona con O (n). Incluso para Sets.
Por lo tanto, al comparar dos matrices tendremos un TimeComplexity de O (n) en el mejor de los casos y O (n ^ 2) en el peor de los casos.
Una solución alternativa (pero desafortunadamente más compleja), que funciona con O (n) en el mejor y el peor de los casos es esta:
# Compares the difference of list a and b
# uses a callback function to compare items
def diff(a, b, callback):
a_missing_in_b = []
ai = 0
bi = 0
a = sorted(a, callback)
b = sorted(b, callback)
while (ai < len(a)) and (bi < len(b)):
cmp = callback(a[ai], b[bi])
if cmp < 0:
a_missing_in_b.append(a[ai])
ai += 1
elif cmp > 0:
# Item b is missing in a
bi += 1
else:
# a and b intersecting on this item
ai += 1
bi += 1
# if a and b are not of same length, we need to add the remaining items
for ai in xrange(ai, len(a)):
a_missing_in_b.append(a[ai])
return a_missing_in_b
p.ej
>>> a=[1,2,3]
>>> b=[2,4,6]
>>> diff(a, b, cmp)
[1, 3]
En caso de que desee la diferencia recursivamente profundizando en los elementos de su lista, he escrito un paquete para Python: https://github.com/erasmose/deepdiff
Instalación
Instalar desde PyPi:
pip install deepdiff
Si es Python3, también necesita instalar:
pip install future six
Ejemplo de uso
>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function
El mismo objeto regresa vacío
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{}
El tipo de artículo ha cambiado
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{''type_changes'': ["root[2]: 2=<type ''int''> vs. 2=<type ''str''>"]}
El valor de un artículo ha cambiado
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
{''values_changed'': [''root[2]: 2 ====>> 4'']}
Artículo agregado y / o eliminado
>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes)
{''dic_item_added'': [''root[5, 6]''],
''dic_item_removed'': [''root[4]''],
''values_changed'': [''root[2]: 2 ====>> 4'']}
Diferencia de cadena
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ ''values_changed'': [ ''root[2]: 2 ====>> 4'',
"root[4][''b'']:/n--- /n+++ /n@@ -1 +1 @@/n-world/n+world!"]}
>>>
>>> print (ddiff.changes[''values_changed''][1])
root[4][''b'']:
---
+++
@@ -1 +1 @@
-world
+world!
Diferencia de cadena 2
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!/nGoodbye!/n1/n2/nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world/n1/n2/nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ ''values_changed'': [ "root[4][''b'']:/n--- /n+++ /n@@ -1,5 +1,4 @@/n-world!/n-Goodbye!/n+world/n 1/n 2/n End"]}
>>>
>>> print (ddiff.changes[''values_changed''][0])
root[4][''b'']:
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End
Tipo de cambio
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world/n/n/nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ ''type_changes'': [ "root[4][''b'']: [1, 2, 3]=<type ''list''> vs. world/n/n/nEnd=<type ''str''>"]}
Lista de diferencia
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ ''list_removed'': ["root[4][''b'']: [3]"]}
Lista de diferencia 2: Tenga en cuenta que NO toma en cuenta el orden
>>> # Note that it DOES NOT take order into account
... t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ }
Lista que contiene diccionario:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
{ ''dic_item_removed'': ["root[4][''b''][2][2]"],
''values_changed'': ["root[4][''b''][2][1]: 1 ====>> 3"]}
En el caso de una lista de diccionarios , la solución de comprensión de lista completa funciona mientras aumenta la solución set
TypeError: unhashable type: ''dict''
Caso de prueba
def diff(a, b):
return [aa for aa in a if aa not in b]
d1 = {"a":1, "b":1}
d2 = {"a":2, "b":2}
d3 = {"a":3, "b":3}
>>> diff([d1, d2, d3], [d2, d3])
[{''a'': 1, ''b'': 1}]
>>> diff([d1, d2, d3], [d1])
[{''a'': 2, ''b'': 2}, {''a'': 3, ''b'': 3}]
Los ejemplos anteriores trivializaron el problema del cálculo de las diferencias. Asumir la clasificación o deduplicación definitivamente hace que sea más fácil calcular la diferencia, pero si su comparación no puede permitirse esas suposiciones, entonces necesitará una implementación no trivial de un algoritmo diff. Ver difflib en la biblioteca estándar de python.
from difflib import SequenceMatcher
squeeze=SequenceMatcher( None, A, B )
print "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!=''equal'',
squeeze.get_opcodes() ) ) ) )
A - B = [[1, 3, 4]]
Puedes hacer un
list(set(A)-set(B))
y
list(set(B)-set(A))
Python 2.7.3 (predeterminado, 27 de febrero de 2014, 19:58:35) - IPython 1.1.0 - timeit: (github gist)
def diff(a, b):
b = set(b)
return [aa for aa in a if aa not in b]
def set_diff(a, b):
return list(set(a) - set(b))
diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]
diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)
from difflib import SequenceMatcher
def squeezer(a, b):
squeeze = SequenceMatcher(None, a, b)
return reduce(lambda p,q: p+q, map(
lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!=''equal'',
squeeze.get_opcodes())))
Resultados:
# Small
a = range(10)
b = range(10/2)
timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop
timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop
timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop
timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop
timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop
# Medium
a = range(10**4)
b = range(10**4/2)
timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop
timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop
timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop
timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop
timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop
# Big
a = xrange(10**7)
b = xrange(10**7/2)
timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop
timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not ''slice''
@ roman-bodnarchuk list comprehensions function def diff (a, b) parece ser más rápido.
Querrá utilizar un set
lugar de una list
.
Un trazador de líneas:
diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)
O:
diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)
Use set
si no le importa el orden o la repetición de los elementos. Use listas de comprensión si lo hace:
>>> def diff(first, second):
second = set(second)
return [item for item in first if item not in second]
>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>>
la manera más simple,
use set (). difference (set ())
list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))
la respuesta está set([1])
>>> set([1,2,3,4]) - set([2,5])
set([1, 3, 4])
>>> set([2,5]) - set([1,2,3,4])
set([5])
A = [1,2,3,4]
B = [2,5]
#A - B
x = list(set(A) - set(B))
#B - A
y = list(set(B) - set(A))
print x
print y