python - Crear un diccionario anidado de un diccionario aplanado

dictionary recursion (6)

Aquí está mi opinión:

def nest_dict(flat): result = {} for k, v in flat.items(): _nest_dict_rec(k, v, result) return result def _nest_dict_rec(k, v, out): k, *rest = k.split(''_'', 1) if rest: _nest_dict_rec(rest[0], v, out.setdefault(k, {})) else: out[k] = v flat = {''X_a_one'': 10, ''X_a_two'': 20, ''X_b_one'': 10, ''X_b_two'': 20, ''Y_a_one'': 10, ''Y_a_two'': 20, ''Y_b_one'': 10, ''Y_b_two'': 20} nested = {''X'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}, ''Y'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}} print(nest_dict(flat) == nested) # True

Tengo un diccionario aplanado que quiero convertir en uno anidado, de la forma

flat = {''X_a_one'': 10, ''X_a_two'': 20, ''X_b_one'': 10, ''X_b_two'': 20, ''Y_a_one'': 10, ''Y_a_two'': 20, ''Y_b_one'': 10, ''Y_b_two'': 20}

Quiero convertirlo a la forma

nested = {''X'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}, ''Y'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}}

La estructura del diccionario plano es tal que no debería haber ningún problema con las ambigüedades. Quiero que funcione para diccionarios de profundidad arbitraria, pero el rendimiento no es realmente un problema. He visto muchos métodos para aplanar un diccionario anidado, pero básicamente ninguno para anidar un diccionario aplanado. Los valores almacenados en el diccionario son escalares o cadenas, nunca iterables.

Hasta ahora tengo algo que puede tomar la entrada

test_dict = {''X_a_one'': ''10'', ''X_b_one'': ''10'', ''X_c_one'': ''10''}

a la salida

test_out = {''X'': {''a_one'': ''10'', ''b_one'': ''10'', ''c_one'': ''10''}}

usando el código

def nest_once(inp_dict): out = {} if isinstance(inp_dict, dict): for key, val in inp_dict.items(): if ''_'' in key: head, tail = key.split(''_'', 1) if head not in out.keys(): out[head] = {tail: val} else: out[head].update({tail: val}) else: out[key] = val return out test_out = nest_once(test_dict)

Pero estoy teniendo problemas para descubrir cómo convertir esto en algo que recursivamente crea todos los niveles del diccionario.

¡Cualquier ayuda sería apreciada!

(En cuanto a por qué quiero hacer esto: tengo un archivo cuya estructura es equivalente a un dict anidado, y quiero almacenar el contenido de este archivo en el diccionario de atributos de un archivo NetCDF y recuperarlo más tarde. Sin embargo, NetCDF solo te permite pongo diccionarios planos como los atributos, así que quiero unflatten el diccionario que guardé previamente en el archivo NetCDF.)

Aquí hay una forma de usar collections.defaultdict , tomando mucho de esta respuesta previa . Hay 3 pasos:

Crea un error defaultdict anidado de los objetos de defaultdict .
Itera elementos en el diccionario de entrada flat .
Construya el resultado de defaultdict acuerdo con la estructura derivada de las claves de división por _ , usando getFromDict para iterar el diccionario de resultados.

Este es un ejemplo completo:

from collections import defaultdict from functools import reduce from operator import getitem def getFromDict(dataDict, mapList): """Iterate nested dictionary""" return reduce(getitem, mapList, dataDict) # instantiate nested defaultdict of defaultdicts tree = lambda: defaultdict(tree) d = tree() # iterate input dictionary for k, v in flat.items(): *keys, final_key = k.split(''_'') getFromDict(d, keys)[final_key] = v {''X'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}, ''Y'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}}

Como paso final, puede convertir su defaultdict en un dict regular, aunque generalmente este paso no es necesario.

def default_to_regular_dict(d): """Convert nested defaultdict to regular dict of dicts.""" if isinstance(d, defaultdict): d = {k: default_to_regular_dict(v) for k, v in d.items()} return d # convert back to regular dict res = default_to_regular_dict(d)

Las otras respuestas son más claras, pero como mencionó la recursión, tenemos otras opciones.

def nest(d): _ = {} for k in d: i = k.find(''_'') if i == -1: _[k] = d[k] continue s, t = k[:i], k[i+1:] if s in _: _[s][t] = d[k] else: _[s] = {t:d[k]} return {k:(nest(_[k]) if type(_[k])==type(d) else _[k]) for k in _}

Otra solución no recursiva sin importaciones. Dividir la lógica entre insertar cada par clave-valor del dict plano y mapear sobre pares clave-valor del dict plano.

def insert(dct, lst): """ dct: a dict to be modified inplace. lst: list of elements representing a hierarchy of keys followed by a value. dct = {} lst = [1, 2, 3] resulting value of dct: {1: {2: 3}} """ for x in lst[:-2]: dct[x] = dct = dct.get(x, dict()) dct.update({lst[-2]: lst[-1]}) def unflat(dct): # empty dict to store the result result = dict() # create an iterator of lists representing hierarchical indices followed by the value lsts = ([*k.split("_"), v] for k, v in dct.items()) # insert each list into the result for lst in lsts: insert(result, lst) return result result = unflat(flat) # {''X'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}, # ''Y'': {''a'': {''one'': 10, ''two'': 20}, ''b'': {''one'': 10, ''two'': 20}}}

Puede usar itertools.groupby :

import itertools, json flat = {''Y_a_two'': 20, ''Y_a_one'': 10, ''X_b_two'': 20, ''X_b_one'': 10, ''X_a_one'': 10, ''X_a_two'': 20, ''Y_b_two'': 20, ''Y_b_one'': 10} _flat = [[*a.split(''_''), b] for a, b in flat.items()] def create_dict(d): _d = {a:list(b) for a, b in itertools.groupby(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])} return {a:create_dict([i[1:] for i in b]) if len(b) > 1 else b[0][-1] for a, b in _d.items()} print(json.dumps(create_dict(_flat), indent=3))

Salida:

{ "Y": { "b": { "two": 20, "one": 10 }, "a": { "two": 20, "one": 10 } }, "X": { "b": { "two": 20, "one": 10 }, "a": { "two": 20, "one": 10 } } }

output = {} for k, v in source.items(): # always start at the root. current = output # This is the part you''re struggling with. pieces = k.split(''_'') # iterate from the beginning until the second to last place for piece in pieces[:-1]: if not piece in current: # if a dict doesn''t exist at an index, then create one current[piece] = {} # as you walk into the structure, update your current location current = current[piece] # The reason you''re using the second to last is because the last place # represents the place you''re actually storing the item current[pieces[-1]] = v