python - single - pandas: ¿cómo ejecutar un pivote con un multi-índice?

pandas stack (2)

Me gustaría ejecutar un pivote en un DataFrame pandas, con el índice siendo dos columnas, no una. Por ejemplo, un campo para el año, uno para el mes, un campo de "elemento" que muestra el "elemento 1" y el "elemento 2" y un campo de "valor" con valores numéricos. Quiero que el índice sea año + mes.

La única forma en que logré que esto funcionara era combinar los dos campos en uno, y luego separarlos nuevamente. ¿hay una mejor manera?

Código mínimo copiado a continuación. ¡Muchas gracias!

PS Sí, soy consciente de que hay otras preguntas con las palabras clave ''pivot'' y ''multi-index'', pero no entendí si / cómo pueden ayudarme con esta pregunta.

import pandas as pd import numpy as np df= pd.DataFrame() month = np.arange(1, 13) values1 = np.random.randint(0, 100, 12) values2 = np.random.randint(200, 300, 12) df[''month''] = np.hstack((month, month)) df[''year''] = 2004 df[''value''] = np.hstack((values1, values2)) df[''item''] = np.hstack((np.repeat(''item 1'', 12), np.repeat(''item 2'', 12))) # This doesn''t work: # ValueError: Wrong number of items passed 24, placement implies 2 # mypiv = df.pivot([''year'', ''month''], ''item'', ''value'') # This doesn''t work, either: # df.set_index([''year'', ''month''], inplace=True) # ValueError: cannot label index with a null key # mypiv = df.pivot(columns=''item'', values=''value'') # This below works but is not ideal: # I have to first concatenate then separate the fields I need df[''new field''] = df[''year''] * 100 + df[''month''] mypiv = df.pivot(''new field'', ''item'', ''value'').reset_index() mypiv[''year''] = mypiv[''new field''].apply( lambda x: int(x) / 100) mypiv[''month''] = mypiv[''new field''] % 100

Creo que si incluyes un item en tu MultiIndex, entonces simplemente puedes desapilarlo:

df.set_index([''year'', ''month'', ''item'']).unstack(level=-1)

Esto produce:

value item item 1 item 2 year month 2004 1 21 277 2 43 244 3 12 262 4 80 201 5 22 287 6 52 284 7 90 249 8 14 229 9 52 205 10 76 207 11 88 259 12 90 200

Es un poco más rápido que usar pivot_table , y aproximadamente a la misma velocidad o ligeramente más lento que usar groupby .

Puedes agrupar y luego desapilar.

>>> df.groupby([''year'', ''month'', ''item''])[''value''].sum().unstack(''item'') item item 1 item 2 year month 2004 1 33 250 2 44 224 3 41 268 4 29 232 5 57 252 6 61 255 7 28 254 8 15 229 9 29 258 10 49 207 11 36 254 12 23 209

O utilice pivot_table :

>>> df.pivot_table( values=''value'', index=[''year'', ''month''], columns=''item'', aggfunc=np.sum) item item 1 item 2 year month 2004 1 33 250 2 44 224 3 41 268 4 29 232 5 57 252 6 61 255 7 28 254 8 15 229 9 29 258 10 49 207 11 36 254 12 23 209