python - make - Almacenamiento de la matriz numpy dispersa en HDF5(PyTables)

show table python (3)

He actualizado la excelente respuesta de Pietro Battiston para Python 3.6 y PyTables 3.x, ya que algunos nombres de funciones de PyTables han cambiado en la actualización de 2.x.

import numpy as np from scipy import sparse import tables def store_sparse_mat(M, name, filename=''store.h5''): """ Store a csr matrix in HDF5 Parameters ---------- M : scipy.sparse.csr.csr_matrix sparse matrix to be stored name: str node prefix in HDF5 hierarchy filename: str HDF5 filename """ assert(M.__class__ == sparse.csr.csr_matrix), ''M must be a csr matrix'' with tables.open_file(filename, ''a'') as f: for attribute in (''data'', ''indices'', ''indptr'', ''shape''): full_name = f''{name}_{attribute}'' # remove existing nodes try: n = getattr(f.root, full_name) n._f_remove() except AttributeError: pass # add nodes arr = np.array(getattr(M, attribute)) atom = tables.Atom.from_dtype(arr.dtype) ds = f.create_carray(f.root, full_name, atom, arr.shape) ds[:] = arr def load_sparse_mat(name, filename=''store.h5''): """ Load a csr matrix from HDF5 Parameters ---------- name: str node prefix in HDF5 hierarchy filename: str HDF5 filename Returns ---------- M : scipy.sparse.csr.csr_matrix loaded sparse matrix """ with tables.open_file(filename) as f: # get nodes attributes = [] for attribute in (''data'', ''indices'', ''indptr'', ''shape''): attributes.append(getattr(f.root, f''{name}_{attribute}'').read()) # construct sparse matrix M = sparse.csr_matrix(tuple(attributes[:3]), shape=attributes[3]) return M

Tengo problemas para almacenar una csr_matrix numpy con PyTables. Me aparece este error:

TypeError: objects of type ``csr_matrix`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or string

Mi código:

f = tables.openFile(path,''w'') atom = tables.Atom.from_dtype(self.count_vector.dtype) ds = f.createCArray(f.root, ''count'', atom, self.count_vector.shape) ds[:] = self.count_vector f.close()

¿Algunas ideas?

Gracias

La respuesta de DaveP es casi correcta ... pero puede causar problemas en matrices muy dispersas: si la (s) última (s) columna (s) o fila (s) están vacías, se descartan. Entonces, para estar seguro de que todo funciona, también se debe almacenar el atributo "forma".

Este es el código que uso regularmente:

import tables as tb from numpy import array from scipy import sparse def store_sparse_mat(m, name, store=''store.h5''): msg = "This code only works for csr matrices" assert(m.__class__ == sparse.csr.csr_matrix), msg with tb.openFile(store,''a'') as f: for par in (''data'', ''indices'', ''indptr'', ''shape''): full_name = ''%s_%s'' % (name, par) try: n = getattr(f.root, full_name) n._f_remove() except AttributeError: pass arr = array(getattr(m, par)) atom = tb.Atom.from_dtype(arr.dtype) ds = f.createCArray(f.root, full_name, atom, arr.shape) ds[:] = arr def load_sparse_mat(name, store=''store.h5''): with tb.openFile(store) as f: pars = [] for par in (''data'', ''indices'', ''indptr'', ''shape''): pars.append(getattr(f.root, ''%s_%s'' % (name, par)).read()) m = sparse.csr_matrix(tuple(pars[:3]), shape=pars[3]) return m

Es trivial adaptarlo a matrices csc.

Una matriz de CSR se puede reconstruir completamente a partir de sus data , indices y atributos indptr . Estas son solo matrices numpy regulares, por lo que no debería haber ningún problema al almacenarlas como 3 matrices separadas en tablas piramidales, y luego pasarlas al constructor de csr_matrix . Mira los documentos impertinentes .

Editar: La respuesta de Pietro ha señalado que el miembro de shape también debe ser almacenado