functions - Advertencia para el usuario de Python Pandas: clasificación porque el eje de no concatenación no está alineado
python pandas download windows (2)
La respuesta de Jezrael es buena, pero no respondió una pregunta que tuve: ¿el hecho de que aparezca el indicador "orden" mal arruinará mis datos de alguna manera? La respuesta es aparentemente "no", estás bien de cualquier manera.
from pandas import DataFrame, concat
a = DataFrame([{''a'':1, ''c'':2,''d'':3 }])
b = DataFrame([{''a'':4,''b'':5, ''d'':6,''e'':7}])
>>> concat([a,b],sort=False)
a c d b e
0 1 2.0 3 NaN NaN
0 4 NaN 6 5.0 7.0
>>> concat([a,b],sort=True)
a b c d e
0 1 NaN 2.0 3 NaN
0 4 5.0 NaN 6 7.0
Estoy haciendo algo de práctica de código y aplicando la fusión de marcos de datos mientras hago esto obteniendo la advertencia del usuario
/usr/lib64/python2.7/site-packages/pandas/core/frame.py:6201: FutureWarning: ordenando porque el eje de no concatenación no está alineado. Una versión futura de pandas cambiará a no ordenada por defecto. Para aceptar el comportamiento futuro, pase ''sort = True''. Para conservar el comportamiento actual y silenciar la advertencia, pase sort = False
En estas líneas de código: ¿Puede usted ayudar a obtener la solución de esta advertencia?
placement_video = [self.read_sql_vdx_summary, self.read_sql_video_km]
placement_video_summary = reduce(lambda left, right: pd.merge(left, right, on=''PLACEMENT'', sort=False), placement_video)
placement_by_video = placement_video_summary.loc[:, ["PLACEMENT", "PLACEMENT_NAME", "COST_TYPE", "PRODUCT",
"VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75",
"VIEW100",
"ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0",
"DPE25",
"DPE50", "DPE75", "DPE100"]]
# print (placement_by_video)
placement_by_video["Placement# Name"] = placement_by_video[["PLACEMENT",
"PLACEMENT_NAME"]].apply(lambda x: ".".join(x),
axis=1)
placement_by_video_new = placement_by_video.loc[:,
["PLACEMENT", "Placement# Name", "COST_TYPE", "PRODUCT", "VIDEONAME",
"VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100",
"ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25",
"DPE50", "DPE75", "DPE100"]]
placement_by_km_video = [placement_by_video_new, self.read_sql_km_for_video]
placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=[''PLACEMENT'', ''PRODUCT''], sort=False),
placement_by_km_video)
#print (list(placement_by_km_video_summary))
#print(placement_by_km_video_summary)
#exit()
# print(placement_by_video_new)
"""Conditions for 25%view"""
mask17 = placement_by_km_video_summary["PRODUCT"].isin([''Display'', ''Mobile''])
mask18 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPCV"])
mask19 = placement_by_km_video_summary["PRODUCT"].isin(["InStream"])
mask20 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+", "CPCV"])
mask_video_video_completions = placement_by_km_video_summary["COST_TYPE"].isin(["CPCV"])
mask21 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE+"])
mask22 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM"])
mask23 = placement_by_km_video_summary["PRODUCT"].isin([''Display'', ''Mobile'', ''InStream''])
mask24 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+"])
choice25video_eng = placement_by_km_video_summary["ENG25"]
choice25video_vwr = placement_by_km_video_summary["VIEW25"]
choice25video_deep = placement_by_km_video_summary["DPE25"]
placement_by_km_video_summary["25_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice25video_eng, choice25video_vwr, choice25video_deep])
"""Conditions for 50%view"""
choice50video_eng = placement_by_km_video_summary["ENG50"]
choice50video_vwr = placement_by_km_video_summary["VIEW50"]
choice50video_deep = placement_by_km_video_summary["DPE50"]
placement_by_km_video_summary["50_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice50video_eng,
choice50video_vwr, choice50video_deep])
"""Conditions for 75%view"""
choice75video_eng = placement_by_km_video_summary["ENG75"]
choice75video_vwr = placement_by_km_video_summary["VIEW75"]
choice75video_deep = placement_by_km_video_summary["DPE75"]
placement_by_km_video_summary["75_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice75video_eng,
choice75video_vwr,
choice75video_deep])
"""Conditions for 100%view"""
choice100video_eng = placement_by_km_video_summary["ENG100"]
choice100video_vwr = placement_by_km_video_summary["VIEW100"]
choice100video_deep = placement_by_km_video_summary["DPE100"]
choicecompletions = placement_by_km_video_summary[''COMPLETIONS'']
placement_by_km_video_summary["100_pc_video"] = np.select([mask17 & mask22, mask19 & mask24, mask17 & mask21, mask23 & mask_video_video_completions],
[choice100video_eng, choice100video_vwr, choice100video_deep, choicecompletions])
"""conditions for 0%view"""
choice0video_eng = placement_by_km_video_summary["ENG0"]
choice0video_vwr = placement_by_km_video_summary["VIEW0"]
choice0video_deep = placement_by_km_video_summary["DPE0"]
placement_by_km_video_summary["Views"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice0video_eng,
choice0video_vwr,
choice0video_deep])
#print (placement_by_km_video_summary)
#exit()
#final Table
placement_by_video_summary = placement_by_km_video_summary.loc[:,
["PLACEMENT", "Placement# Name", "PRODUCT", "VIDEONAME", "COST_TYPE",
"Views", "25_pc_video", "50_pc_video", "75_pc_video","100_pc_video",
"ENGAGEMENTS","IMPRESSIONS", "DPEENGAMENTS"]]
#placement_by_km_video = [placement_by_video_summary, self.read_sql_km_for_video]
#placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=[''PLACEMENT'', ''PRODUCT'']),
#placement_by_km_video)
#print(placement_by_video_summary)
#exit()
# dup_col =["IMPRESSIONS","ENGAGEMENTS","DPEENGAMENTS"]
# placement_by_video_summary.loc[placement_by_video_summary.duplicated(dup_col),dup_col] = np.nan
# print ("Dhar",placement_by_video_summary)
''''''adding views based on conditions''''''
#filter maximum value from videos
placement_by_video_summary_new = placement_by_km_video_summary.loc[
placement_by_km_video_summary.reset_index().groupby([''PLACEMENT'', ''PRODUCT''])[''Views''].idxmax()]
#print (placement_by_video_summary_new)
#exit()
# print (placement_by_video_summary_new)
# mask22 = (placement_by_video_summary_new.PRODUCT.str.upper ()==''DISPLAY'') & (placement_by_video_summary_new.COST_TYPE==''CPE'')
placement_by_video_summary_new.loc[mask17 & mask18, ''Views''] = placement_by_video_summary_new[''ENGAGEMENTS'']
placement_by_video_summary_new.loc[mask19 & mask20, ''Views''] = placement_by_video_summary_new[''IMPRESSIONS'']
placement_by_video_summary_new.loc[mask17 & mask21, ''Views''] = placement_by_video_summary_new[''DPEENGAMENTS'']
#print (placement_by_video_summary_new)
#exit()
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index).append(
placement_by_video_summary_new).sort_index()
placement_by_video_summary["Video Completion Rate"] = placement_by_video_summary["100_pc_video"] / /
placement_by_video_summary["Views"]
placement_by_video_final = placement_by_video_summary.loc[:,
["Placement# Name", "PRODUCT", "VIDEONAME", "Views",
"25_pc_video", "50_pc_video", "75_pc_video", "100_pc_video",
"Video Completion Rate"]]
tl; dr:
En pandas 0.23+ es necesario pasar el parámetro
sort=True
si desea ordenar las columnas o el índice después de
concat
o
append
, para evitar el pase
sort=False
.
Este comportamiento es nuevo en pandas 0.23.0 :
En una versión futura de pandas,
pandas.concat()
y
DataFrame.append()
ya no ordenarán el eje de no concatenación cuando aún no esté alineado.
El comportamiento actual es el mismo que el anterior (clasificación), pero ahora se emite una advertencia cuando no se especifica la clasificación y el eje de no concatenación no está alineado,
link
.
Más información sobre el tema de github muy antiguo vinculado, comentario de smcinerney :
Al concatear DataFrames, los nombres de las columnas se ordenan alfanuméricamente si hay alguna diferencia entre ellos. Si son idénticos en los DataFrames, no se ordenan.
Este tipo es indocumentado y no deseado. Ciertamente, el comportamiento por defecto no debe ser ordenado.
Después de algún tiempo se implementó la
sort
parámetros en
pandas.concat
y
DataFrame.append
:
sort : booleano, por defecto Ninguno
Ordene el eje de no concatenación si aún no está alineado cuando la unión es ''externa''. El valor predeterminado actual de la clasificación está en desuso y cambiará a no ordenado en una versión futura de pandas.
Pase explícitamente sort = True para silenciar la advertencia y ordenar. Pase explícitamente sort = False para silenciar la advertencia y no ordenar.
Esto no tiene efecto cuando join = ''inner'', que ya conserva el orden del eje de no concatenación.
Entonces, si las mismas columnas en ambos DataFrame no hay advertencia, no hay ordenación:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=[''a'', ''b''])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=[''a'', ''b''])
print (pd.concat([df1, df2]))
a b
0 1 0
1 2 8
0 4 7
1 5 3
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=[''b'', ''a''])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=[''b'', ''a''])
print (pd.concat([df1, df2]))
b a
0 0 1
1 8 2
0 7 4
1 3 5
Pero si columnas diferentes, orden diferente o no columnas iguales en ambos pandas devuelven una advertencia si no hay una
sort
parámetros (significa
sort=None
, valor predeterminado)
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=[''b'', ''a''])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=[''a'', ''b''])
print (pd.concat([df1, df2]))
Advertencia futura: ordenación porque el eje de no concatenación no está alineado.
a b
0 1 0
1 2 8
0 4 7
1 5 3
print (pd.concat([df1, df2], sort=True))
a b
0 1 0
1 2 8
0 4 7
1 5 3
print (pd.concat([df1, df2], sort=False))
b a
0 0 1
1 8 2
0 7 4
1 3 5
Si las diferentes columnas de las primeras columnas de los DataFrames están alineadas, significa que se asignaron correctamente, las columnas
a
y
b
de
df1
con
a
y
b
de la columna
b
, porque existen en ambas, para otras columnas se crean valores faltantes, porque no existen en la primera en segundo.
Último si pass
sort=True
columnas
sort=True
se ordenan alfanuméricamente, si
sort=False
columnas de segundo DafaFrame si no existen en primer lugar se agregan para finalizar sin ordenación:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], ''e'':[5, 0]},
columns=[''b'', ''a'',''e''])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], ''c'':[2, 8], ''d'':[7, 0]},
columns=[''c'',''b'',''a'',''d''])
print (pd.concat([df1, df2]))
Advertencia futura: ordenación porque el eje de no concatenación no está alineado.
a b c d e
0 1 0 NaN NaN 5.0
1 2 8 NaN NaN 0.0
0 4 7 2.0 7.0 NaN
1 5 3 8.0 0.0 NaN
print (pd.concat([df1, df2], sort=True))
a b c d e
0 1 0 NaN NaN 5.0
1 2 8 NaN NaN 0.0
0 4 7 2.0 7.0 NaN
1 5 3 8.0 0.0 NaN
print (pd.concat([df1, df2], sort=False))
b a e c d
0 0 1 5.0 NaN NaN
1 8 2 0.0 NaN NaN
0 7 4 NaN 2.0 7.0
1 3 5 NaN 8.0 0.0
En su código:
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index)
.append(placement_by_video_summary_new, sort=True)
.sort_index()