python - Generalizar la operación de corte en una matriz NumPy

arrays indexing (4)

Aquí está la extensión para manejar ndarrays genéricos:

def indices_merged_arr_generic(arr, arr_pos="last"): n = arr.ndim grid = np.ogrid[tuple(map(slice, arr.shape))] out = np.empty(arr.shape + (n+1,), dtype=np.result_type(arr.dtype, int)) if arr_pos=="first": offset = 1 elif arr_pos=="last": offset = 0 else: raise Exception("Invalid arr_pos") for i in range(n): out[...,i+offset] = grid[i] out[...,-1+offset] = arr out.shape = (-1,n+1) return out

Ejecuciones de muestra

Caso 2D:

In [252]: arr Out[252]: array([[37, 32, 73], [95, 80, 97]]) In [253]: indices_merged_arr_generic(arr) Out[253]: array([[ 0, 0, 37], [ 0, 1, 32], [ 0, 2, 73], [ 1, 0, 95], [ 1, 1, 80], [ 1, 2, 97]]) In [254]: indices_merged_arr_generic(arr, arr_pos=''first'') Out[254]: array([[37, 0, 0], [32, 0, 1], [73, 0, 2], [95, 1, 0], [80, 1, 1], [97, 1, 2]])

Caja 3D:

In [226]: arr Out[226]: array([[[35, 45, 33], [48, 38, 20], [69, 31, 90]], [[73, 65, 73], [27, 51, 45], [89, 50, 74]]]) In [227]: indices_merged_arr_generic(arr) Out[227]: array([[ 0, 0, 0, 35], [ 0, 0, 1, 45], [ 0, 0, 2, 33], [ 0, 1, 0, 48], [ 0, 1, 1, 38], [ 0, 1, 2, 20], [ 0, 2, 0, 69], [ 0, 2, 1, 31], [ 0, 2, 2, 90], [ 1, 0, 0, 73], [ 1, 0, 1, 65], [ 1, 0, 2, 73], [ 1, 1, 0, 27], [ 1, 1, 1, 51], [ 1, 1, 2, 45], [ 1, 2, 0, 89], [ 1, 2, 1, 50], [ 1, 2, 2, 74]])

Esta pregunta se basa en this pregunta anterior:

Dado un conjunto:

In [122]: arr = np.array([[1, 3, 7], [4, 9, 8]]); arr Out[122]: array([[1, 3, 7], [4, 9, 8]])
Y dados sus índices:

In [127]: np.indices(arr.shape) Out[127]: array([[[0, 0, 0], [1, 1, 1]], [[0, 1, 2], [0, 1, 2]]])
¿Cómo podría apilarlos cuidadosamente uno contra el otro para formar una nueva matriz 2D? Esto es lo que me gustaría:

array([[0, 0, 1], [0, 1, 3], [0, 2, 7], [1, 0, 4], [1, 1, 9], [1, 2, 8]])

Esta solución de Divakar es lo que uso actualmente para matrices 2D:

def indices_merged_arr(arr): m,n = arr.shape I,J = np.ogrid[:m,:n] out = np.empty((m,n,3), dtype=arr.dtype) out[...,0] = I out[...,1] = J out[...,2] = arr out.shape = (-1,3) return out

Ahora, si quisiera pasar una matriz 3D, necesito modificar esta función:

def indices_merged_arr(arr): m,n,k = arr.shape # here I,J,K = np.ogrid[:m,:n,:k] # here out = np.empty((m,n,k,4), dtype=arr.dtype) # here out[...,0] = I out[...,1] = J out[...,2] = K # here out[...,3] = arr out.shape = (-1,4) # here return out

Pero esta función ahora funciona solo para matrices 3D: no puedo pasarle una matriz 2D.

¿Hay alguna forma de generalizar esto para trabajar en cualquier dimensión? Aquí está mi intento:

def indices_merged_arr_general(arr): tup = arr.shape idx = np.ogrid[????] # not sure what to do here.... out = np.empty(tup + (len(tup) + 1, ), dtype=arr.dtype) for i, j in enumerate(idx): out[...,i] = j out[...,len(tup) - 1] = arr out.shape = (-1, len(tup) return out

Estoy teniendo problemas con esta línea:

idx = np.ogrid[????]

¿Cómo puedo hacer que esto funcione?

Para arreglos grandes, AFAIK, el producto cartesian_deSenderle es la forma más rápida ¹ de generar productos cartesianos usando NumPy:

In [372]: A = np.random.random((100,100,100)) In [373]: %timeit indices_merged_arr_generic_using_cp(A) 100 loops, best of 3: 16.8 ms per loop In [374]: %timeit indices_merged_arr_generic(A) 10 loops, best of 3: 28.9 ms per loop

Aquí está la configuración que solía comparar. A continuación, indices_merged_arr_generic_using_cp es una modificación de cartesian_product de senderle para incluir la matriz aplanada junto con el producto cartesiano:

import numpy as np import functools def indices_merged_arr_generic_using_cp(arr): """ Based on cartesian_product http://.com/a/11146645/190597 (senderle) """ shape = arr.shape arrays = [np.arange(s, dtype=''int'') for s in shape] broadcastable = np.ix_(*arrays) broadcasted = np.broadcast_arrays(*broadcastable) rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1 out = np.empty(rows * cols, dtype=arr.dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows out[start:] = arr.flatten() return out.reshape(cols, rows).T def indices_merged_arr_generic(arr): """ https://.com/a/46135084/190597 (Divakar) """ n = arr.ndim grid = np.ogrid[tuple(map(slice, arr.shape))] out = np.empty(arr.shape + (n+1,), dtype=arr.dtype) for i in range(n): out[...,i] = grid[i] out[...,-1] = arr out.shape = (-1,n+1) return out

¹ Tenga en cuenta que anteriormente utilicé realmente cartesian_product_transpose de senderle. Para mí, esta es la versión más rápida. Para otros, incluido senderle, cartesian_product es más rápido.

Podemos usar el siguiente oneliner:

from numpy import hstack, array, meshgrid hstack(( array(meshgrid(*map(range, t.shape))).T.reshape(-1,t.ndim), t.flatten().reshape(-1,1) ))

Aquí primero usamos map(range, t.shape) para construir un iterable de range s. Al usar np.meshgrid(..).T.reshape(-1, t.dim) construimos la primera parte de la tabla: una matriz n × m con n el número de elementos de t , ym el número de dimensiones , a continuación, agregamos una versión aplanada de t a la derecha.

ndenumerate itera en los elementos, a diferencia de las dimensiones en las otras soluciones. Así que no espero que gane las pruebas de velocidad. Pero aquí hay una forma de usarlo

In [588]: arr = np.array([[1, 3, 7], [4, 9, 8]]) In [589]: arr Out[589]: array([[1, 3, 7], [4, 9, 8]]) In [590]: list(np.ndenumerate(arr)) Out[590]: [((0, 0), 1), ((0, 1), 3), ((0, 2), 7), ((1, 0), 4), ((1, 1), 9), ((1, 2), 8)]

En py3 * desempaquetado se puede usar en una tupla, por lo que las tuplas anidadas se pueden aplanar:

In [591]: [(*ij,v) for ij,v in np.ndenumerate(arr)] Out[591]: [(0, 0, 1), (0, 1, 3), (0, 2, 7), (1, 0, 4), (1, 1, 9), (1, 2, 8)] In [592]: np.array(_) Out[592]: array([[0, 0, 1], [0, 1, 3], [0, 2, 7], [1, 0, 4], [1, 1, 9], [1, 2, 8]])

Y se generaliza muy bien a más dimensiones:

In [593]: arr3 = np.arange(24).reshape(2,3,4) In [594]: np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)]) Out[594]: array([[ 0, 0, 0, 0], [ 0, 0, 1, 1], [ 0, 0, 2, 2], [ 0, 0, 3, 3], [ 0, 1, 0, 4], [ 0, 1, 1, 5], .... [ 1, 2, 3, 23]])

Con estas pequeñas muestras, en realidad es más rápido que la función de @ Diakar. :)

In [598]: timeit indices_merged_arr_generic(arr) 52.8 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [599]: timeit indices_merged_arr_generic(arr3) 66.9 µs ± 434 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [600]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr)]) 21.2 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [601]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)]) 59.4 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Pero para una gran matriz 3D es mucho más lento

In [602]: A = np.random.random((100,100,100)) In [603]: timeit indices_merged_arr_generic(A) 50.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [604]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(A)]) 2.39 s ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Y con `@ unutbu''s - más lento para pequeño, más rápido para grande:

In [609]: timeit indices_merged_arr_generic_using_cp(arr) 104 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [610]: timeit indices_merged_arr_generic_using_cp(arr3) 141 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [611]: timeit indices_merged_arr_generic_using_cp(A) 31.1 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)