transponer - recorrer una matriz en python

Cómo agregar una columna extra a una matriz NumPy (14)

Digamos que tengo una matriz NumPy, a :

a = np.array([ [1, 2, 3], [2, 3, 4] ])

Y me gustaría agregar una columna de ceros para obtener una matriz, b :

b = np.array([ [1, 2, 3, 0], [2, 3, 4, 0] ])

¿Cómo puedo hacer esto fácilmente en NumPy?

Creo que una solución más sencilla y más rápida de arrancar es hacer lo siguiente:

import numpy as np N = 10 a = np.random.rand(N,N) b = np.zeros((N,N+1)) b[:,:-1] = a

Y los tiempos:

In [23]: N = 10 In [24]: a = np.random.rand(N,N) In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1)))) 10000 loops, best of 3: 19.6 us per loop In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a 100000 loops, best of 3: 5.62 us per loop

En mi caso, tuve que agregar una columna de unos a una matriz NumPy

X = array([ 6.1101, 5.5277, ... ]) X.shape => (97,) X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)

Después de X.shape => (97, 2)

array([[ 1. , 6.1101], [ 1. , 5.5277], ...

Encuentro lo siguiente más elegante:

b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3

Una ventaja de la insert es que también le permite insertar columnas (o filas) en otros lugares dentro de la matriz. Además, en lugar de insertar un solo valor, puede insertar fácilmente un vector completo, por ejemplo, duplicar la última columna:

b = np.insert(a, insert_index, values=a[:,2], axis=1)

Lo que lleva a:

array([[1, 2, 3, 3], [2, 3, 4, 4]])

Para la sincronización, la insert podría ser más lenta que la solución de JoshAdel:

In [1]: N = 10 In [2]: a = np.random.rand(N,N) In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1)))) 100000 loops, best of 3: 7.5 µs per loop In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a 100000 loops, best of 3: 2.17 µs per loop In [5]: %timeit b = np.insert(a, 3, values=0, axis=1) 100000 loops, best of 3: 10.2 µs per loop

Hay una función específicamente para esto. Se llama numpy.pad

a = np.array([[1,2,3], [2,3,4]]) b = np.pad(a, ((0, 0), (0, 1)), mode=''constant'', constant_values=0) print b >>> array([[1, 2, 3, 0], [2, 3, 4, 0]])

Esto es lo que dice en la cadena de documentación:

Pads an array. Parameters ---------- array : array_like of rank N Input array pad_width : {sequence, array_like, int} Number of values padded to the edges of each axis. ((before_1, after_1), ... (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes. mode : str or function One of the following string values or a user supplied function. ''constant'' Pads with a constant value. ''edge'' Pads with the edge values of array. ''linear_ramp'' Pads with the linear ramp between end_value and the array edge value. ''maximum'' Pads with the maximum value of all or part of the vector along each axis. ''mean'' Pads with the mean value of all or part of the vector along each axis. ''median'' Pads with the median value of all or part of the vector along each axis. ''minimum'' Pads with the minimum value of all or part of the vector along each axis. ''reflect'' Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis. ''symmetric'' Pads with the reflection of the vector mirrored along the edge of the array. ''wrap'' Pads with the wrap of the vector along the axis. The first values are used to pad the end and the end values are used to pad the beginning. <function> Padding function, see Notes. stat_length : sequence or int, optional Used in ''maximum'', ''mean'', ''median'', and ''minimum''. Number of values at edge of each axis used to calculate the statistic value. ((before_1, after_1), ... (before_N, after_N)) unique statistic lengths for each axis. ((before, after),) yields same before and after statistic lengths for each axis. (stat_length,) or int is a shortcut for before = after = statistic length for all axes. Default is ``None``, to use the entire axis. constant_values : sequence or int, optional Used in ''constant''. The values to set the padded values for each axis. ((before_1, after_1), ... (before_N, after_N)) unique pad constants for each axis. ((before, after),) yields same before and after constants for each axis. (constant,) or int is a shortcut for before = after = constant for all axes. Default is 0. end_values : sequence or int, optional Used in ''linear_ramp''. The values used for the ending value of the linear_ramp and that will form the edge of the padded array. ((before_1, after_1), ... (before_N, after_N)) unique end values for each axis. ((before, after),) yields same before and after end values for each axis. (constant,) or int is a shortcut for before = after = end value for all axes. Default is 0. reflect_type : {''even'', ''odd''}, optional Used in ''reflect'', and ''symmetric''. The ''even'' style is the default with an unaltered reflection around the edge value. For the ''odd'' style, the extented part of the array is created by subtracting the reflected values from two times the edge value. Returns ------- pad : ndarray Padded array of rank equal to `array` with shape increased according to `pad_width`. Notes ----- .. versionadded:: 1.7.0 For an array with rank greater than 1, some of the padding of later axes is calculated from padding of previous axes. This is easiest to think about with a rank 2 array where the corners of the padded array are calculated by using padded values from the first axis. The padding function, if used, should return a rank 1 array equal in length to the vector argument with padded values replaced. It has the following signature:: padding_func(vector, iaxis_pad_width, iaxis, kwargs) where vector : ndarray A rank 1 array already padded with zeros. Padded values are vector[:pad_tuple[0]] and vector[-pad_tuple[1]:]. iaxis_pad_width : tuple A 2-tuple of ints, iaxis_pad_width[0] represents the number of values padded at the beginning of vector where iaxis_pad_width[1] represents the number of values padded at the end of vector. iaxis : int The axis currently being calculated. kwargs : dict Any keyword arguments the function requires. Examples -------- >>> a = [1, 2, 3, 4, 5] >>> np.pad(a, (2,3), ''constant'', constant_values=(4, 6)) array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6]) >>> np.pad(a, (2, 3), ''edge'') array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5]) >>> np.pad(a, (2, 3), ''linear_ramp'', end_values=(5, -4)) array([ 5, 3, 1, 2, 3, 4, 5, 2, -1, -4]) >>> np.pad(a, (2,), ''maximum'') array([5, 5, 1, 2, 3, 4, 5, 5, 5]) >>> np.pad(a, (2,), ''mean'') array([3, 3, 1, 2, 3, 4, 5, 3, 3]) >>> np.pad(a, (2,), ''median'') array([3, 3, 1, 2, 3, 4, 5, 3, 3]) >>> a = [[1, 2], [3, 4]] >>> np.pad(a, ((3, 2), (2, 3)), ''minimum'') array([[1, 1, 1, 2, 1, 1, 1], [1, 1, 1, 2, 1, 1, 1], [1, 1, 1, 2, 1, 1, 1], [1, 1, 1, 2, 1, 1, 1], [3, 3, 3, 4, 3, 3, 3], [1, 1, 1, 2, 1, 1, 1], [1, 1, 1, 2, 1, 1, 1]]) >>> a = [1, 2, 3, 4, 5] >>> np.pad(a, (2, 3), ''reflect'') array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2]) >>> np.pad(a, (2, 3), ''reflect'', reflect_type=''odd'') array([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]) >>> np.pad(a, (2, 3), ''symmetric'') array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3]) >>> np.pad(a, (2, 3), ''symmetric'', reflect_type=''odd'') array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7]) >>> np.pad(a, (2, 3), ''wrap'') array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3]) >>> def pad_with(vector, pad_width, iaxis, kwargs): ... pad_value = kwargs.get(''padder'', 10) ... vector[:pad_width[0]] = pad_value ... vector[-pad_width[1]:] = pad_value ... return vector >>> a = np.arange(6) >>> a = a.reshape((2, 3)) >>> np.pad(a, 2, pad_with) array([[10, 10, 10, 10, 10, 10, 10], [10, 10, 10, 10, 10, 10, 10], [10, 10, 0, 1, 2, 10, 10], [10, 10, 3, 4, 5, 10, 10], [10, 10, 10, 10, 10, 10, 10], [10, 10, 10, 10, 10, 10, 10]]) >>> np.pad(a, 2, pad_with, padder=100) array([[100, 100, 100, 100, 100, 100, 100], [100, 100, 100, 100, 100, 100, 100], [100, 100, 0, 1, 2, 100, 100], [100, 100, 3, 4, 5, 100, 100], [100, 100, 100, 100, 100, 100, 100], [100, 100, 100, 100, 100, 100, 100]])

Me gusta la respuesta de JoshAdel por el enfoque en el rendimiento. Una mejora menor del rendimiento es evitar la sobrecarga de inicializar con ceros, solo para sobrescribirlos. Esto tiene una diferencia medible cuando N es grande, se usa vacío en lugar de ceros, y la columna de ceros se escribe como un paso separado:

In [1]: import numpy as np In [2]: N = 10000 In [3]: a = np.ones((N,N)) In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a 1 loops, best of 3: 492 ms per loop In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],)) 1 loops, best of 3: 407 ms per loop

Suponiendo que M es un ndarray (100,3) e y es un apéndice ndarray (100,) se puede utilizar de la siguiente manera:

M=numpy.append(M,y[:,None],1)

El truco es usar

y[:, None]

Esto convierte y a una matriz 2D (100, 1).

M.shape

ahora da

(100, 4)

También me interesó esta pregunta y comparé la velocidad de

numpy.c_[a, a] numpy.stack([a, a]).T numpy.vstack([a, a]).T numpy.ascontiguousarray(numpy.stack([a, a]).T) numpy.ascontiguousarray(numpy.vstack([a, a]).T) numpy.column_stack([a, a]) numpy.concatenate([a[:,None], a[:,None]], axis=1) numpy.concatenate([a[None], a[None]], axis=0).T

que todos hacen lo mismo para cualquier vector de entrada a . Tiempos para crecer a :

Tenga en cuenta que todas las variantes no contiguas (en particular stack / vstack ) son eventualmente más rápidas que todas las variantes contiguas. column_stack (por su claridad y velocidad) parece ser una buena opción si necesita contigüidad.

Código para reproducir la trama:

import numpy import perfplot perfplot.show( setup=lambda n: numpy.random.rand(n), kernels=[ lambda a: numpy.c_[a, a], lambda a: numpy.ascontiguousarray(numpy.stack([a, a]).T), lambda a: numpy.ascontiguousarray(numpy.vstack([a, a]).T), lambda a: numpy.column_stack([a, a]), lambda a: numpy.concatenate([a[:, None], a[:, None]], axis=1), lambda a: numpy.ascontiguousarray(numpy.concatenate([a[None], a[None]], axis=0).T), lambda a: numpy.stack([a, a]).T, lambda a: numpy.vstack([a, a]).T, lambda a: numpy.concatenate([a[None], a[None]], axis=0).T, ], labels=[ ''c_'', ''ascont(stack)'', ''ascont(vstack)'', ''column_stack'', ''concat'', ''ascont(concat)'', ''stack (non-cont)'', ''vstack (non-cont)'', ''concat (non-cont)'' ], n_range=[2**k for k in range(20)], xlabel=''len(a)'', logx=True, logy=True, )

Un poco tarde para la fiesta, pero nadie publicó esta respuesta todavía, así que, para completarla: puedes hacer esto con una lista de comprensión, en una matriz de Python simple:

source = a.tolist() result = [row + [0] for row in source] b = np.array(result)

Una forma, usando hstack , es:

b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))

Utilice numpy.append :

>>> a = np.array([[1,2,3],[2,3,4]]) >>> a array([[1, 2, 3], [2, 3, 4]]) >>> z = np.zeros((2,1), dtype=int64) >>> z array([[0], [0]]) >>> np.append(a, z, axis=1) array([[1, 2, 3, 0], [2, 3, 4, 0]])

Yo creo que:

np.column_stack((a, zeros(shape(a)[0])))

es mas elegante

np.concatenate también funciona

>>> a = np.array([[1,2,3],[2,3,4]]) >>> a array([[1, 2, 3], [2, 3, 4]]) >>> z = np.zeros((2,1)) >>> z array([[ 0.], [ 0.]]) >>> np.concatenate((a, z), axis=1) array([[ 1., 2., 3., 0.], [ 2., 3., 4., 0.]])

np.insert también sirve el propósito.

matA = np.array([[1,2,3], [2,3,4]]) idx = 3 new_col = np.array([0, 0]) np.insert(matA, idx, new_col, axis=1) array([[1, 2, 3, 0], [2, 3, 4, 0]])

Inserta valores, aquí new_col , antes de un índice dado, aquí idx largo de un eje. En otras palabras, los valores recién insertados ocuparán la columna idx y moverán lo que originalmente estaban allí en y después de idx hacia atrás.

np.r_[ ... ] y np.c_[ ... ] son alternativas útiles a vstack y hstack , con corchetes [] en lugar de round ().
Un par de ejemplos:

: import numpy as np : N = 3 : A = np.eye(N) : np.c_[ A, np.ones(N) ] # add a column array([[ 1., 0., 0., 1.], [ 0., 1., 0., 1.], [ 0., 0., 1., 1.]]) : np.c_[ np.ones(N), A, np.ones(N) ] # or two array([[ 1., 1., 0., 0., 1.], [ 1., 0., 1., 0., 1.], [ 1., 0., 0., 1., 1.]]) : np.r_[ A, [A[1]] ] # add a row array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 0., 1., 0.]]) : # not np.r_[ A, A[1] ] : np.r_[ A[0], 1, 2, 3, A[1] ] # mix vecs and scalars array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.]) : np.r_[ A[0], [1, 2, 3], A[1] ] # lists array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.]) : np.r_[ A[0], (1, 2, 3), A[1] ] # tuples array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.]) : np.r_[ A[0], 1:4, A[1] ] # same, 1:4 == arange(1,4) == 1,2,3 array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])

(El motivo de los corchetes [] en lugar de round () es que Python se expande, por ejemplo, 1: 4 en el cuadrado, las maravillas de la sobrecarga).