python - tablas - Búsqueda vectorizada de valores en el marco de datos de Pandas
seleccionar columnas de un dataframe pandas (2)
Abajo para la conveniencia de todos para reproducir los resultados.
columns=[''AAPL'',''GOOG'',''IBM'',''XOM'']
index = [''2011-01-10'',''2011-01-13'',''2011-01-26'',''2011-02-02'',''2011-02-10'',''2011-03-03'',''2011-05-03'',''2011-06-03'',''2011-06-10'',''2011-08-01'',''2011-12-20'']
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.44,614.21,142.78,71.57]
prices.iloc[1]=[342.64,616.69,143.92,73.08]
prices.iloc[2]=[340.82,616.50,155.74,75.89]
prices.iloc[3]=[341.29,612.00,157.93,79.46]
prices.iloc[4]=[351.42,616.44,159.32,79.68]
prices.iloc[5]=[356.40,609.56,158.73,82.19]
prices.iloc[6]=[345.14,533.89,167.84,82.00]
prices.iloc[7]=[340.42,523.08,160.97,78.19]
prices.iloc[8]=[323.03,509.51,159.14,76.84]
prices.iloc[9]=[393.26,606.77,176.28,76.67]
prices.iloc[10]=[392.46,630.37,184.14,79.97]
columns=[''Date'',''direction'',''size'',''ticker'',''prices'']
orders = pd.DataFrame(columns=columns)
orders.loc[0] = [''2011-01-10'',''Buy'',1500,''AAPL'',339.44]
orders.loc[1] = [''2011-01-13'',''Sell'',1500,''AAPL'',342.64]
orders.loc[2] = [''2011-01-13'',''Buy'',4000,''IBM'',143.92]
orders.loc[3] = [''2011-01-26'',''Buy'',1000,''GOOG'',616.50]
orders.loc[4] = [''2011-02-02'',''Sell'',4000,''XOM'',79.46]
orders.loc[5] = [''2011-02-10'',''Buy'',4000,''XOM'',79.68]
orders.loc[6] = [''2011-03-03'',''Sell'',1000,''GOOG'',609.56]
orders.loc[7] = [''2011-03-03'',''Sell'',2200,''IBM'',158.73]
orders.loc[8] = [''2011-06-03'',''Sell'',3300,''IBM'',160.97]
orders.loc[9] = [''2011-05-03'',''Buy'',1500,''IBM'',167.84]
orders.loc[10] = [''2011-06-10'',''Buy'',1200,''AAPL'',323.03]
orders.loc[11] = [''2011-08-01'',''Buy'',55,''GOOG'',606.77]
orders.loc[12] = [''2011-08-01'',''Sell'',55,''GOOG'',606.77]
orders.loc[13] = [''2011-12-20'',''Sell'',1200,''AAPL'',392.46]
lookupValues = prices.lookup(orders.Date, orders.ticker)
Entonces el resultado:
>>> prices
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.5 155.74 75.89
2011-02-02 341.29 612 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.4 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
>>> orders
Date direction size ticker prices
0 2011-01-10 Buy 1500 AAPL 339.44
1 2011-01-13 Sell 1500 AAPL 342.64
2 2011-01-13 Buy 4000 IBM 143.92
3 2011-01-26 Buy 1000 GOOG 616.50
4 2011-02-02 Sell 4000 XOM 79.46
5 2011-02-10 Buy 4000 XOM 79.68
6 2011-03-03 Sell 1000 GOOG 609.56
7 2011-03-03 Sell 2200 IBM 158.73
8 2011-06-03 Sell 3300 IBM 160.97
9 2011-05-03 Buy 1500 IBM 167.84
10 2011-06-10 Buy 1200 AAPL 323.03
11 2011-08-01 Buy 55 GOOG 606.77
12 2011-08-01 Sell 55 GOOG 606.77
13 2011-12-20 Sell 1200 AAPL 392.46
>>> lookupValues
array([339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56, 158.73,
160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
>>>
Tengo dos marcos de datos de pandas, uno llamado "órdenes" y otro llamado "precios-diarios". Daily_prices es el siguiente:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
Los pedidos son los siguientes:
direction size ticker prices
2011-01-10 Buy 1500 AAPL 339.44
2011-01-13 Sell 1500 AAPL 342.64
2011-01-13 Buy 4000 IBM 143.92
2011-01-26 Buy 1000 GOOG 616.50
2011-02-02 Sell 4000 XOM 79.46
2011-02-10 Buy 4000 XOM 79.68
2011-03-03 Sell 1000 GOOG 609.56
2011-03-03 Sell 2200 IBM 158.73
2011-06-03 Sell 3300 IBM 160.97
2011-05-03 Buy 1500 IBM 167.84
2011-06-10 Buy 1200 AAPL 323.03
2011-08-01 Buy 55 GOOG 606.77
2011-08-01 Sell 55 GOOG 606.77
2011-12-20 Sell 1200 AAPL 392.46
El índice de ambos marcos de datos es datetime.date. la columna de ''precios'' en el marco de datos de ''pedidos'' se agregó utilizando una lista de comprensión para recorrer todos los pedidos y buscar el marcador específico para la fecha específica en el marco de datos de ''precios-diarios'' y luego agregar esa lista como una columna al marco de datos de ''pedidos''. Me gustaría hacer esto usando una operación de matriz en lugar de algo que se repite. ¿Se puede hacer? Traté de usar:
daily_prices.ix [fechas, tickers]
pero esto devuelve una matriz de producto cartesiano de las dos listas. quiero que devuelva un vector de columna de solo el precio de un ticker específico para una fecha específica.
Utilice nuestra lookup
amigos, diseñada precisamente para este propósito:
In [17]: prices
Out[17]:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
In [18]: orders
Out[18]:
Date direction size ticker prices
0 2011-01-10 00:00:00 Buy 1500 AAPL 339.44
1 2011-01-13 00:00:00 Sell 1500 AAPL 342.64
2 2011-01-13 00:00:00 Buy 4000 IBM 143.92
3 2011-01-26 00:00:00 Buy 1000 GOOG 616.50
4 2011-02-02 00:00:00 Sell 4000 XOM 79.46
5 2011-02-10 00:00:00 Buy 4000 XOM 79.68
6 2011-03-03 00:00:00 Sell 1000 GOOG 609.56
7 2011-03-03 00:00:00 Sell 2200 IBM 158.73
8 2011-06-03 00:00:00 Sell 3300 IBM 160.97
9 2011-05-03 00:00:00 Buy 1500 IBM 167.84
10 2011-06-10 00:00:00 Buy 1200 AAPL 323.03
11 2011-08-01 00:00:00 Buy 55 GOOG 606.77
12 2011-08-01 00:00:00 Sell 55 GOOG 606.77
13 2011-12-20 00:00:00 Sell 1200 AAPL 392.46
In [19]: prices.lookup(orders.Date, orders.ticker)
Out[19]:
array([ 339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56,
158.73, 160.97, 167.84, 323.03, 606.77, 606.77, 392.46])