una tutorial transpuesta multiplicar matriz matrices functions funciones español python numpy slice binning

python - tutorial - cambiar el tamaño con un promedio o volver a formar una matriz numpy 2d



numpy tutorial español pdf (4)

Aquí hay un ejemplo basado en la respuesta que has vinculado (para mayor claridad):

>>> import numpy as np >>> a = np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) >>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1) array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]])

Como una función:

def rebin(a, shape): sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1] return a.reshape(sh).mean(-1).mean(1)

Estoy tratando de volver a implementar en python una función IDL:

http://star.pst.qub.ac.uk/idl/REBIN.html

que reduce el tamaño por un factor entero una matriz 2d promediando.

Por ejemplo:

>>> a=np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]])

Me gustaría cambiar su tamaño a (2,3) tomando la media de las muestras relevantes, el resultado esperado sería:

>>> b = rebin(a, (2, 3)) >>> b array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]])

es decir, b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) y así sucesivamente.

Creo que debería cambiar a una matriz de 4 dimensiones y luego tomar la media en el sector correcto, pero no pude averiguar el algoritmo. ¿Tendrías alguna pista?


Esta es una forma de hacer lo que pide utilizando la multiplicación de matrices que no requiere las nuevas dimensiones de matriz para dividir la antigua.

Primero, generamos una matriz de compresores de filas y una matriz de compresores de columnas (estoy seguro de que hay una forma más limpia de hacerlo, tal vez incluso usando solo operaciones numpy):

def get_row_compressor(old_dimension, new_dimension): dim_compressor = np.zeros((new_dimension, old_dimension)) bin_size = float(old_dimension) / new_dimension next_bin_break = bin_size which_row = 0 which_column = 0 while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]: if round(next_bin_break - which_column, 10) >= 1: dim_compressor[which_row, which_column] = 1 which_column += 1 elif next_bin_break == which_column: which_row += 1 next_bin_break += bin_size else: partial_credit = next_bin_break - which_column dim_compressor[which_row, which_column] = partial_credit which_row += 1 dim_compressor[which_row, which_column] = 1 - partial_credit which_column += 1 next_bin_break += bin_size dim_compressor /= bin_size return dim_compressor def get_column_compressor(old_dimension, new_dimension): return get_row_compressor(old_dimension, new_dimension).transpose()

... entonces, por ejemplo, get_row_compressor(5, 3) te da:

[[ 0.6 0.4 0. 0. 0. ] [ 0. 0.2 0.6 0.2 0. ] [ 0. 0. 0. 0.4 0.6]]

y get_column_compressor(3, 2) le ofrece:

[[ 0.66666667 0. ] [ 0.33333333 0.33333333] [ 0. 0.66666667]]

Luego simplemente premultiplicar por el compresor de la fila y postmultiplicar por el compresor de la columna para obtener la matriz comprimida:

def compress_and_average(array, new_shape): # Note: new shape should be smaller in both dimensions than old shape return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * / np.mat(array) * / np.mat(get_column_compressor(array.shape[1], new_shape[1]))

Utilizando esta técnica,

compress_and_average(np.array([[50, 7, 2, 0, 1], [0, 0, 2, 8, 4], [4, 1, 1, 0, 0]]), (2, 3))

rendimientos

[[ 21.86666667 2.66666667 2.26666667] [ 1.86666667 1.46666667 1.86666667]]


Estaba tratando de reducir la escala de un ráster: tome un raster de aproximadamente 6000 por 2000 y conviértalo en un ráster de tamaño arbitrario más pequeño que promedió los valores correctamente en los tamaños de los contenedores anteriores. Encontré una solución utilizando SciPy, pero luego no pude hacer que SciPy se instalara en el servicio de alojamiento compartido que estaba usando, así que escribí esta función. Es probable que haya mejores formas de hacerlo que no impliquen recorrer las filas y columnas, pero esto parece funcionar.

Lo bueno de esto es que el número anterior de filas y columnas no tiene que ser divisible por el nuevo número de filas y columnas.

def resize_array(a, new_rows, new_cols): '''''' This function takes an 2D numpy array a and produces a smaller array of size new_rows, new_cols. new_rows and new_cols must be less than or equal to the number of rows and columns in a. '''''' rows = len(a) cols = len(a[0]) yscale = float(rows) / new_rows xscale = float(cols) / new_cols # first average across the cols to shorten rows new_a = np.zeros((rows, new_cols)) for j in range(new_cols): # get the indices of the original array we are going to average across the_x_range = (j*xscale, (j+1)*xscale) firstx = int(the_x_range[0]) lastx = int(the_x_range[1]) # figure out the portion of the first and last index that overlap # with the new index, and thus the portion of those cells that # we need to include in our average x0_scale = 1 - (the_x_range[0]-int(the_x_range[0])) xEnd_scale = (the_x_range[1]-int(the_x_range[1])) # scale_line is a 1d array that corresponds to the portion of each old # index in the_x_range that should be included in the new average scale_line = np.ones((lastx-firstx+1)) scale_line[0] = x0_scale scale_line[-1] = xEnd_scale # Make sure you don''t screw up and include an index that is too large # for the array. This isn''t great, as there could be some floating # point errors that mess up this comparison. if scale_line[-1] == 0: scale_line = scale_line[:-1] lastx = lastx - 1 # Now it''s linear algebra time. Take the dot product of a slice of # the original array and the scale_line new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum() # Then average across the rows to shorten the cols. Same method as above. # It is probably possible to simplify this code, as this is more or less # the same procedure as the block of code above, but transposed. # Here I''m reusing the variable a. Sorry if that''s confusing. a = np.zeros((new_rows, new_cols)) for i in range(new_rows): the_y_range = (i*yscale, (i+1)*yscale) firsty = int(the_y_range[0]) lasty = int(the_y_range[1]) y0_scale = 1 - (the_y_range[0]-int(the_y_range[0])) yEnd_scale = (the_y_range[1]-int(the_y_range[1])) scale_line = np.ones((lasty-firsty+1)) scale_line[0] = y0_scale scale_line[-1] = yEnd_scale if scale_line[-1] == 0: scale_line = scale_line[:-1] lasty = lasty - 1 a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() return a


JF Sebastian tiene una gran respuesta para binning 2D. Aquí hay una versión de su función "rebin" que funciona para N dimensiones:

def bin_ndarray(ndarray, new_shape, operation=''sum''): """ Bins an ndarray in all axes based on the target shape, by summing or averaging. Number of output dimensions must match number of input dimensions and new axes must divide old ones. Example ------- >>> m = np.arange(0,100,1).reshape((10,10)) >>> n = bin_ndarray(m, new_shape=(5,5), operation=''sum'') >>> print(n) [[ 22 30 38 46 54] [102 110 118 126 134] [182 190 198 206 214] [262 270 278 286 294] [342 350 358 366 374]] """ operation = operation.lower() if not operation in [''sum'', ''mean'']: raise ValueError("Operation not supported.") if ndarray.ndim != len(new_shape): raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape, new_shape)) compression_pairs = [(d, c//d) for d,c in zip(new_shape, ndarray.shape)] flattened = [l for p in compression_pairs for l in p] ndarray = ndarray.reshape(flattened) for i in range(len(new_shape)): op = getattr(ndarray, operation) ndarray = op(-1*(i+1)) return ndarray