open - write xlsx in python

cómo convertir xls a xlsx (10)

Solución simple

xlx una solución simple para convertir algunos xlx a xlsx . Hay muchas respuestas aquí, pero están haciendo algo de "magia" que no entiendo completamente.

Una solución simple fue dada por chfw , pero no del todo completa.

Instalar dependencias

Use pip para instalar

pip install pyexcel-cli pyexcel-xls pyexcel-xlsx

Ejecutar

Todo el estilo y las macros desaparecerán, pero la información está intacta.

Para un solo archivo

pyexcel transcode your-file-in.xls your-new-file-out.xlsx

Para todos los archivos en la carpeta, un forro

for file in *.xls; do; echo "Transcoding $file"; pyexcel transcode "$file" "${file}x"; done;

Tengo algunos archivos * .xls (excel 2003), y quiero convertir esos archivos a xlsx (excel 2007).

Utilizo el paquete uno de python. Cuando guardo los documentos, puedo configurar el nombre del filtro: MS Excel 97, pero no hay un nombre de filtro como "MS Excel 2007".

por favor, ayúdeme, ¿cómo puedo configurar el nombre del filtro para convertir xls a xlsx?

CONVERTIR ARCHIVO XLS A XLSX

Usando python3.6 acabo de encontrar el mismo problema y, después de horas de lucha, lo resolví haciendo el ff, probablemente no necesitarás todos los paquetes: (seré lo más claro posible)

Asegúrate de instalar los siguientes paquetes antes de continuar

pip instalar pyexcel, pip instalar pyexcel-xls, pip instalar pyexcel-xlsx,

pip instalar pyexcel-cli

paso 1:

import pyexcel

paso 2: "example.xls", "example.xlsx", "example.xlsm"

sheet0 = pyexcel.get_sheet(file_name="your_file_path.xls", name_columns_by_row=0)

paso 3: crear una matriz de contenidos

xlsarray = sheet.to_array()

Paso 4: verifica los contenidos variables para verificar

xlsarray

paso5: pase la matriz contenida en la variable llamada (xlsarray) a una nueva variable de libro llamada (hoja1)

sheet1 = pyexcel.Sheet(xlsarray)

step6: guarda la nueva hoja que termina con .xlsx (en mi caso quiero xlsx)

sheet1.save_as("test.xlsx")

Aquí está mi solución, sin considerar fuentes, gráficos e imágenes:

$ pip install pyexcel pyexcel-xls pyexcel-xlsx

Entonces haz esto ::

import pyexcel as p p.save_book_as(file_name=''your-file-in.xls'', dest_file_name=''your-new-file-out.xlsx'')

Si no necesita un programa, puede instalar un paquete adicional pyexcel-cli ::

$ pip install pyexcel-cli $ pyexcel transcode your-file-in.xls your-new-file-out.xlsx

El procedimiento de transcodificación anterior utiliza xlrd y openpyxl.

Estoy mejorando el rendimiento para el método @Jackypengyu.

XLSX : trabajando por fila, no por celda ( http://openpyxl.readthedocs.io/en/default/api/openpyxl.worksheet.worksheet.html#openpyxl.worksheet.worksheet.Worksheet.append )
XLS : lea la fila completa sin incluir la cola vacía, vea ragged_rows=True ( http://xlrd.readthedocs.io/en/latest/api.html#xlrd.sheet.Sheet.row_slice )

Las celdas fusionadas también se convertirán.

Resultados

Convertir los mismos 12 archivos en el mismo orden:

Original :

0:00:01.958159 0:00:02.115891 0:00:02.018643 0:00:02.057803 0:00:01.267079 0:00:01.308073 0:00:01.245989 0:00:01.289295 0:00:01.273805 0:00:01.276003 0:00:01.293834 0:00:01.261401

Mejorado

0:00:00.774101 0:00:00.734749 0:00:00.741434 0:00:00.744491 0:00:00.320796 0:00:00.279045 0:00:00.315829 0:00:00.280769 0:00:00.316380 0:00:00.289196 0:00:00.347819 0:00:00.284242

Solución

def cvt_xls_to_xlsx(*args, **kw): """Open and convert XLS file to openpyxl.workbook.Workbook object @param args: args for xlrd.open_workbook @param kw: kwargs for xlrd.open_workbook @return: openpyxl.workbook.Workbook You need -> from openpyxl.utils.cell import get_column_letter """ book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw) book_xlsx = Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for crange in sheet_xls.merged_cells: rlo, rhi, clo, chi = crange sheet_xlsx.merge_cells( start_row=rlo + 1, end_row=rhi, start_column=clo + 1, end_column=chi, ) def _get_xlrd_cell_value(cell): value = cell.value if cell.ctype == xlrd.XL_CELL_DATE: value = datetime.datetime(*xlrd.xldate_as_tuple(value, 0)) return value for row in range(sheet_xls.nrows): sheet_xlsx.append(( _get_xlrd_cell_value(cell) for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row)) )) for rowx in range(sheet_xls.nrows): if sheet_xls.rowinfo_map[rowx].hidden != 0: print sheet_names[sheet_index], rowx sheet_xlsx.row_dimensions[rowx+1].hidden = True for coly in range(sheet_xls.ncols): if sheet_xls.colinfo_map[coly].hidden != 0: print sheet_names[sheet_index], coly coly_letter = get_column_letter(coly+1) sheet_xlsx.column_dimensions[coly_letter].hidden = True return book_xlsx

He tenido que hacer esto antes. La idea principal es usar el módulo xlrd para abrir y analizar un archivo xls y escribir el contenido en un archivo xlsx usando el módulo openpyxl .

Aquí está mi código. ¡Atención! No puede manejar archivos xls complejos, debe agregar su propia lógica de análisis si lo va a utilizar.

import xlrd from openpyxl.workbook import Workbook from openpyxl.reader.excel import load_workbook, InvalidFileException def open_xls_as_xlsx(filename): # first open using xlrd book = xlrd.open_workbook(filename) index = 0 nrows, ncols = 0, 0 while nrows * ncols == 0: sheet = book.sheet_by_index(index) nrows = sheet.nrows ncols = sheet.ncols index += 1 # prepare a xlsx sheet book1 = Workbook() sheet1 = book1.get_active_sheet() for row in xrange(0, nrows): for col in xrange(0, ncols): sheet1.cell(row=row, column=col).value = sheet.cell_value(row, col) return book1

Intenté la solución de @Jhon Anderson, funciona bien pero tengo un error de "el año está fuera de rango" cuando hay celdas de formato de hora como HH: mm: ss sin fecha. Ahí para que mejoré el algoritmo de nuevo:

def xls_to_xlsx(*args, **kw): """ open and convert an XLS file to openpyxl.workbook.Workbook ---------- @param args: args for xlrd.open_workbook @param kw: kwargs for xlrd.open_workbook @return: openpyxl.workbook.Workbook对象 """ book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw) book_xlsx = openpyxl.workbook.Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for crange in sheet_xls.merged_cells: rlo, rhi, clo, chi = crange sheet_xlsx.merge_cells(start_row=rlo + 1, end_row=rhi, start_column=clo + 1, end_column=chi,) def _get_xlrd_cell_value(cell): value = cell.value if cell.ctype == xlrd.XL_CELL_DATE: datetime_tup = xlrd.xldate_as_tuple(value,0) if datetime_tup[0:3] == (0, 0, 0): # time format without date value = datetime.time(*datetime_tup[3:]) else: value = datetime.datetime(*datetime_tup) return value for row in range(sheet_xls.nrows): sheet_xlsx.append(( _get_xlrd_cell_value(cell) for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row)) )) return book_xlsx

Entonces trabaja perfecto!

La Respuesta de Ray recortaba la primera fila y la última columna de los datos. Aquí está mi solución modificada (para python3):

def open_xls_as_xlsx(filename): # first open using xlrd book = xlrd.open_workbook(filename) index = 0 nrows, ncols = 0, 0 while nrows * ncols == 0: sheet = book.sheet_by_index(index) nrows = sheet.nrows+1 #bm added +1 ncols = sheet.ncols+1 #bm added +1 index += 1 # prepare a xlsx sheet book1 = Workbook() sheet1 = book1.get_active_sheet() for row in range(1, nrows): for col in range(1, ncols): sheet1.cell(row=row, column=col).value = sheet.cell_value(row-1, col-1) #bm added -1''s return book1

La respuesta de Ray me ayudó mucho, pero para aquellos que buscan una forma sencilla de convertir todas las hojas de un xls a un xlsx, hice esta Gist :

import xlrd from openpyxl.workbook import Workbook as openpyxlWorkbook # content is a string containing the file. For example the result of an http.request(url). # You can also use a filepath by calling "xlrd.open_workbook(filepath)". xlsBook = xlrd.open_workbook(file_contents=content) workbook = openpyxlWorkbook() for i in xrange(0, xlsBook.nsheets): xlsSheet = xlsBook.sheet_by_index(i) sheet = workbook.active if i == 0 else workbook.create_sheet() sheet.title = xlsSheet.name for row in xrange(0, xlsSheet.nrows): for col in xrange(0, xlsSheet.ncols): sheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col) # The new xlsx file is in "workbook", without iterators (iter_rows). # For iteration, use "for row in worksheet.rows:". # For range iteration, use "for row in worksheet.range("{}:{}".format(startCell, endCell)):".

Puede encontrar el libl xlrd here y el openpyxl here (por ejemplo, debe descargar xlrd en su proyecto para Google App Engine).

Necesitas tener win32com instalado en tu máquina. Aquí está mi código:

import win32com.client as win32 fname = "full+path+to+xls_file" excel = win32.gencache.EnsureDispatch(''Excel.Application'') wb = excel.Workbooks.Open(fname) wb.SaveAs(fname+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension wb.Close() #FileFormat = 56 is for .xls extension excel.Application.Quit()

No encontré ninguna de las respuestas aquí 100% correcta. Así que publico mis códigos aquí:

import xlrd from openpyxl.workbook import Workbook def cvt_xls_to_xlsx(src_file_path, dst_file_path): book_xls = xlrd.open_workbook(src_file_path) book_xlsx = Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(0,len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active() sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for row in range(0, sheet_xls.nrows): for col in range(0, sheet_xls.ncols): sheet_xlsx.cell(row = row+1 , column = col+1).value = sheet_xls.cell_value(row, col) book_xlsx.save(dst_file_path)