sheet - XLRD/Python: Leyendo el archivo de Excel en dict con bucles for

sheet_names python (6)

Busco leer en un libro de Excel con 15 campos y aproximadamente 2000 filas, y convertir cada fila a un diccionario en Python. Entonces quiero añadir cada diccionario a una lista. Me gustaría que cada campo en la fila superior del libro de trabajo sea una clave dentro de cada diccionario, y que el valor de celda correspondiente sea el valor dentro del diccionario. Ya he visto ejemplos here y here , pero me gustaría hacer algo un poco diferente. El segundo ejemplo funcionará, pero creo que sería más eficiente hacer un bucle sobre la fila superior para rellenar las claves del diccionario y luego iterar a través de cada fila para obtener los valores. Mi archivo de Excel contiene datos de foros de discusión y se parece a esto (obviamente con más columnas):

id thread_id forum_id post_time votes post_text 4 100 3 1377000566 1 ''here is some text'' 5 100 4 1289003444 0 ''even more text here''

Por lo tanto, me gustaría que los campos id , thread_id , etc., sean las claves del diccionario. Me gustaría que mis diccionarios se vieran como:

{id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: ''here is some text''}

Inicialmente, tuve un código como este que recorre el archivo, pero mi alcance es incorrecto para algunos de los bucles for y estoy generando demasiados diccionarios. Aquí está mi código inicial:

import xlrd from xlrd import open_workbook, cellname book = open(''forum.xlsx'', ''r'') sheet = book.sheet_by_index(3) dict_list = [] for row_index in range(sheet.nrows): for col_index in range(sheet.ncols): d = {} # My intuition for the below for-loop is to take each cell in the top row of the # Excel sheet and add it as a key to the dictionary, and then pass the value of # current index in the above loops as the value to the dictionary. This isn''t # working. for i in sheet.row(0): d[str(i)] = sheet.cell(row_index, col_index).value dlist.append(d)

Cualquier ayuda sería muy apreciada. Gracias de antemano por leer.

¡Esta respuesta me ayudó mucho! Estuve jugueteando con una forma de hacer esto durante unas dos horas. Entonces encontré esta respuesta elegante y corta. ¡Gracias!

Necesitaba alguna forma de convertir xls a json usando claves.

Así que adapté el script de arriba con una declaración json print así:

from xlrd import open_workbook import simplejson as json #http://.com/questions/23568409/xlrd-python-reading-excel-file-into-dict-with-for-loops?lq=1 book = open_workbook(''makelijk-bomen-herkennen-schors.xls'') sheet = book.sheet_by_index(0) # read header values into the list keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)] print "keys are", keys dict_list = [] for row_index in xrange(1, sheet.nrows): d = {keys[col_index]: sheet.cell(row_index, col_index).value for col_index in xrange(sheet.ncols)} dict_list.append(d) #print dict_list j = json.dumps(dict_list) # Write to file with open(''data.json'', ''w'') as f: f.write(j)

Este script le permite transformar los datos de Excel en una lista de diccionarios.

import xlrd workbook = xlrd.open_workbook(''forum.xls'') workbook = xlrd.open_workbook(''forum.xls'', on_demand = True) worksheet = workbook.sheet_by_index(0) first_row = [] # The row where we stock the name of the column for col in range(worksheet.ncols): first_row.append( worksheet.cell_value(0,col) ) # tronsform the workbook to a list of dictionnary data =[] for row in range(1, worksheet.nrows): elm = {} for col in range(worksheet.ncols): elm[first_row[col]]=worksheet.cell_value(row,col) data.append(elm) print data

Intente configurar primero sus claves analizando solo la primera línea, todas las columnas, otra función para analizar los datos, luego llámelos en orden.

all_fields_list = [] header_dict = {} def parse_data_headers(sheet): global header_dict for c in range(sheet.ncols): key = sheet.cell(1, c) #here 1 is the row number where your header is header_dict[c] = key #store it somewhere, here I have chosen to store in a dict def parse_data(sheet): for r in range(2, sheet.nrows): row_dict = {} for c in range(sheet.ncols): value = sheet.cell(r,c) row_dict[c] = value all_fields_list.append(row_dict)

La idea es, primero, leer el encabezado en la lista. Luego, itere sobre las filas de la hoja (a partir de la siguiente después del encabezado), cree un nuevo diccionario basado en las claves de encabezado y los valores de celda apropiados y agréguelo a la lista de diccionarios:

from xlrd import open_workbook book = open_workbook(''forum.xlsx'') sheet = book.sheet_by_index(3) # read header values into the list keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)] dict_list = [] for row_index in xrange(1, sheet.nrows): d = {keys[col_index]: sheet.cell(row_index, col_index).value for col_index in xrange(sheet.ncols)} dict_list.append(d) print dict_list

Para una hoja que contenga:

A B C D 1 2 3 4 5 6 7 8

se imprime:

[{''A'': 1.0, ''C'': 3.0, ''B'': 2.0, ''D'': 4.0}, {''A'': 5.0, ''C'': 7.0, ''B'': 6.0, ''D'': 8.0}]

UPD (ampliando la comprensión del diccionario):

d = {} for col_index in xrange(sheet.ncols): d[keys[col_index]] = sheet.cell(row_index, col_index).value

Prueba este. Esta función a continuación devolverá el generador contiene dict de cada fila y columna.

from xlrd import open_workbook for row in parse_xlsx(): print row # {id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: ''here is some text''} def parse_xlsx(): workbook = open_workbook(''excelsheet.xlsx'') sheets = workbook.sheet_names() active_sheet = workbook.sheet_by_name(sheets[0]) num_rows = active_sheet.nrows num_cols = active_sheet.ncols header = [active_sheet.cell_value(0, cell).lower() for cell in range(num_cols)] for row_idx in xrange(1, num_rows): row_cell = [active_sheet.cell_value(row_idx, col_idx) for col_idx in range(num_cols)] yield dict(zip(header, row_cell))

from xlrd import open_workbook dict_list = [] book = open_workbook(''forum.xlsx'') sheet = book.sheet_by_index(3) # read first row for keys keys = sheet.row_values(0) # read the rest rows for values values = [sheet.row_values(i) for i in range(1, sheet.nrows)] for value in values: dict_list.append(dict(zip(keys, value))) print dict_list