python - librerias - ¿Cuál es la forma más rápida de comparar parches de una matriz?

matriz de correlación python (3)

Quiero comparar diferentes áreas de una matriz bidimensional $ A $ con una matriz dada $ b $ de un tamaño más pequeño. Como tengo que hacerlo muchas veces, es crucial que esto se realice muy rápido. Tengo una solución que funciona bien, pero esperaba que pudiera hacerse mejor y más rápido.

En detalle:

Digamos que tenemos una gran matriz y una pequeña matriz. Reparto todos los ''parches'' posibles dentro de la matriz grande que tienen el mismo tamaño que la matriz pequeña y comparo estos parches con la matriz pequeña dada.

def get_best_fit(big_array, small_array): # we assume the small array is square patch_size = small_array.shape[0] min_value = np.inf for x in range(patch_size, big_array.shape[0] - patch_size): for y in range(patch_size, big_array.shape[1] - patch_size): p = get_patch_term(x, y, patch_size, big_array) tmp = some_metric(p, small_array) if min_value > tmp: min_value = tmp min_patch = p return min_patch, min_value

Para obtener los parches obtuve esta implementación de acceso directo a la matriz:

def get_patch_term(x, y, patch_size, data): """ a patch has the size (patch_size)^^2 """ patch = data[(x - (patch_size-1)/2): (x + (patch_size-1)/2 + 1), (y - (patch_size-1)/2): (y + (patch_size-1)/2 + 1)] return patch

Supongo que esta es la tarea más crucial y se puede realizar más rápido, pero no estoy seguro de ello.

Eché un vistazo a Cython, pero tal vez lo hice mal, no estoy realmente familiarizado con él.

Mi primer intento fue una traducción directa a cython:

def get_patch_term_fast(Py_ssize_t x_i, Py_ssize_t y_i, Py_ssize_t patch_size, np.ndarray[DTYPE_t, ndim=2] big_array): assert big_array.dtype == DTYPE patch_size = (patch_size - 1)/2 cdef np.ndarray[DTYPE_t, ndim=2] patch = <np.ndarray[DTYPE_t, ndim=2]>big_array[(x_i - patch_size):(x_i + patch_size + 1), (y_i - patch_size): (y_i + patch_size + 1)] return patch

Y esto parece ser más rápido (ver más abajo) pero pensé que un enfoque paralelo debería ser mejor, por lo que se me ocurrió

def get_patch_term_fast_parallel(Py_ssize_t x_i, Py_ssize_t y_i, Py_ssize_t patch_size, np.ndarray[DTYPE_t, ndim=2] big_array): assert big_array.dtype == DTYPE patch_size = (patch_size - 1)/2 assert big_array.dtype == DTYPE cdef Py_ssize_t x cdef Py_ssize_t y cdef np.ndarray[DTYPE_t, ndim=1] patch = np.empty(np.power((2 * patch_size) + 1, 2)) with nogil, parallel(): for x in prange(x_i - patch_size, x_i + patch_size + 1): for y in prange(y_i - patch_size, y_i + patch_size + 1): patch[((x - (x_i - patch_size)) * (2 * patch_size + 1)) + (y - (y_i - patch_size))] = big_array[x, y] #cdef np.ndarray[DTYPE_t, ndim=2] patch = <np.ndarray[DTYPE_t, ndim=2]>big_array[(x_i - patch_size):(x_i + patch_size + 1), (y_i - patch_size): (y_i + patch_size + 1)] return patch

Que es, por desgracia, no más rápido. Para las pruebas utilicé:

A = np.array(range(1200), dtype=np.float).reshape(30, 40) b = np.array([41, 42, 81, 84]).reshape(2, 2) x = 7 y = 7 print(timeit.timeit(lambda:get_patch_term_fast(x, y, 3, A), number=300)) print(timeit.timeit(lambda:get_patch_term_fast_parallel(x, y, 3, A).reshape(3,3), number=300)) print(timeit.timeit(lambda:get_patch_term(x, y, 3, A), number=300))

Lo que da

0.0008792859734967351 0.0029909340664744377 0.0029337930027395487

Entonces, mi primera pregunta es, ¿es posible hacerlo más rápido? La segunda pregunta sería, ¿por qué el enfoque paralelo no es más rápido que la implementación numpy original?

Editar:

Intenté paralelizar aún más la función get_best_fit pero desafortunadamente recibo muchos errores que indican que no puedo asignar un objeto Python sin gil.

Aquí está el código:

def get_best_fit_fast(np.ndarray[DTYPE_t, ndim=2] big_array, np.ndarray[DTYPE_t, ndim=2] small_array): # we assume the small array is square cdef Py_ssize_t patch_size = small_array.shape[0] cdef Py_ssize_t x cdef Py_ssize_t y cdef Py_ssize_t x_range = big_array.shape[0] - patch_size cdef Py_ssize_t y_range = big_array.shape[1] - patch_size cdef np.ndarray[DTYPE_t, ndim=2] p cdef np.ndarray[DTYPE_t, ndim=2] weights = np.empty((x_range - patch_size)*(y_range - patch_size)).reshape((x_range - patch_size), (y_range - patch_size)) with nogil, parallel(): for x in prange(patch_size, x_range): for y in prange(patch_size, y_range): p = get_patch_term_fast(x, y, patch_size, big_array) weights[x - patch_size, y - patch_size] = np.linalg.norm(np.abs(p - small_array)) return np.min(weights)

PD: omití la parte de devolver el parche más pequeño ...

Creo que dependiendo de lo que some_metric su función some_metric , es posible que ya exista una implementación altamente optimizada. Por ejemplo, si es solo una convolución, entonces eche un vistazo a Theano que incluso le permitirá aprovechar GPU con bastante facilidad. Incluso si su función no es tan simple como una simple convolución, es probable que haya bloques de construcción dentro de Theano que pueda usar en lugar de intentar ir a un nivel muy bajo con Cython.

El otro problema con la medición de tiempo de la función paralela es que se llama reshape en su objeto de matriz después de ejecutar su función paralela. Podría darse el caso de que la función paralela sea más rápida, pero luego la remodelación está agregando tiempo extra (aunque la reshape debería ser bastante rápida, pero no estoy seguro de qué tan rápido).

El otro problema es que no sabemos cuál es su término some_metric , y no parece que lo esté usando en paralelo. La forma en que veo que funciona su código paralelo es que obtiene los parches en paralelo y luego los pone en una cola para ser analizados por la función some_metric , por lo tanto, some_metric el propósito de la paralelización de su código.

Como dijo John Greenhall, el uso de FFT y convoluciones puede ser bastante rápido. Sin embargo, para aprovechar el procesamiento en paralelo, también deberá realizar el análisis del parche y la pequeña matriz en paralelo.

¿Qué tan grandes son las matrices? ¿Es la memoria un problema aquí también?

Inicialmente publicó otra respuesta basada en la coincidencia de patrones (llevada por el título), publica otra respuesta

use un bucle para recorrer ambas matrices (grandes y pequeñas) y aplique la métrica de correlación parcial (o lo que sea) sin rebanar listas todo el tiempo:

como una nota al margen, observe el hecho (según la métrica, por supuesto) de que una vez que se haya completado un parche, el siguiente parche (ya sea una fila abajo o una columna a la derecha) comparte mucho con el parche anterior, solo sus filas inicial y final (o las columnas) cambian, por lo que la nueva correlación se puede calcular más rápido que la anterior, restando la fila anterior y agregando una nueva fila (o viceversa). Dado que la función métrica no se da se agrega como observación.

def get_best_fit(big_array, small_array): # we assume the small array is square patch_size = small_array.shape[0] x = 0 y = 0 x2 = 0 y2 = 0 W = big_array.shape[0] H = big_array.shape[1] W2 = patch_size H2 = patch_size min_value = np.inf tmp = 0 min_patch = None start = 0 end = H*W start2 = 0 end2 = W2*H2 while start < end: tmp = 0 start2 = 0 x2 = 0 y2 = 0 valid = True while start2 < end2: if x+x2 >= W or y+y2 >= H: valid = False break # !!compute partial metric!! # no need to slice lists all the time tmp += some_metric_partial(x, y, x2, y2, big_array, small_array) x2 += 1 if x2>=W2: x2 = 0 y2 += 1 start2 += 1 # one patch covered # try next patch if valid and min_value > tmp: min_value = tmp min_patch = [x, y, W2, H2] x += 1 if x>=W: x = 0 y += 1 start += 1 return min_patch, min_value