data - python append pandas

Agregue la hoja de Excel existente con un nuevo marco de datos usando pandas de Python (1)

Actualmente tengo este código. Funciona perfectamente

Recorre los archivos de Excel en una carpeta, elimina las primeras 2 filas, luego los guarda como archivos de Excel individuales, y también guarda los archivos en el ciclo como un archivo adjunto.

Actualmente, el archivo adjunto sobrescribe el archivo existente cada vez que ejecuto el código.

Necesito agregar los nuevos datos al final de la hoja de Excel ya existente (''master_data.xlsx)

dfList = [] path = ''C://Test//TestRawFile'' newpath = ''C://Path//To//New//Folder'' for fn in os.listdir(path): # Absolute file path file = os.path.join(path, fn) if os.path.isfile(file): # Import the excel file and call it xlsx_file xlsx_file = pd.ExcelFile(file) # View the excel files sheet names xlsx_file.sheet_names # Load the xlsx files Data sheet as a dataframe df = xlsx_file.parse(''Sheet1'',header= None) df_NoHeader = df[2:] data = df_NoHeader # Save individual dataframe data.to_excel(os.path.join(newpath, fn)) dfList.append(data) appended_data = pd.concat(dfList) appended_data.to_excel(os.path.join(newpath, ''master_data.xlsx''))

Pensé que sería una tarea simple, pero supongo que no. Creo que necesito traer el archivo master_data.xlsx como un marco de datos, luego hacer coincidir el índice con los nuevos datos agregados y guardarlo de nuevo. O tal vez hay una manera más fácil. Cualquier ayuda es apreciada.

Una función auxiliar para agregar DataFrame al archivo Excel existente :

def append_df_to_excel(filename, df, sheet_name=''Sheet1'', startrow=None, truncate_sheet=False, **to_excel_kwargs): """ Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] doesn''t exist, then this function will create it. Parameters: filename : File path or existing ExcelWriter (Example: ''/path/to/file.xlsx'') df : dataframe to save to workbook sheet_name : Name of sheet which will contain DataFrame. (default: ''Sheet1'') startrow : upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row... truncate_sheet : truncate (remove and recreate) [sheet_name] before writing DataFrame to Excel file to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()` [can be dictionary] Returns: None """ from openpyxl import load_workbook import pandas as pd # ignore [engine] parameter if it was passed if ''engine'' in to_excel_kwargs: to_excel_kwargs.pop(''engine'') writer = pd.ExcelWriter(filename, engine=''openpyxl'') # Python 2.x: define [FileNotFoundError] exception if it doesn''t exist try: FileNotFoundError except NameError: FileNotFoundError = IOError try: # try to open an existing workbook writer.book = load_workbook(filename) # get the last row in the existing Excel sheet # if it was not specified explicitly if startrow is None and sheet_name in writer.book.sheetnames: startrow = writer.book[sheet_name].max_row # truncate sheet if truncate_sheet and sheet_name in writer.book.sheetnames: # index of [sheet_name] sheet idx = writer.book.sheetnames.index(sheet_name) # remove [sheet_name] writer.book.remove(writer.book.worksheets[idx]) # create an empty sheet [sheet_name] using old index writer.book.create_sheet(sheet_name, idx) # copy existing sheets writer.sheets = {ws.title:ws for ws in writer.book.worksheets} except FileNotFoundError: # file does not exist yet, we will create it pass if startrow is None: startrow = 0 # write out the new sheet df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs) # save the workbook writer.save()

Ejemplos de uso ...

Respuesta anterior: le permite escribir varios DataFrames en un nuevo archivo de Excel.

Puede usar el motor openpyxl junto con el parámetro startrow :

In [48]: writer = pd.ExcelWriter(''c:/temp/test.xlsx'', engine=''openpyxl'') In [49]: df.to_excel(writer, index=False) In [50]: df.to_excel(writer, startrow=len(df)+2, index=False) In [51]: writer.save()

c: /temp/test.xlsx:

PD: también puede especificar el header=None si no desea duplicar los nombres de columna ...

ACTUALIZACIÓN: es posible que también desee comprobar esta solución