Usando Python, ¿cómo descomprimir puramente en la memoria?

tar python-requests (4)

Esto debería ayudar

import sys import zipfile sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip" zip_file = zipfile.ZipFile(sys.argv[0]) items_file = zip_file.open(''AllListing1RES.txt'', ''rU'') df = read_table(items_file, sep=''/t'', index_col=0)

Estoy trabajando en un entorno donde no puedo guardar nada en el disco. Necesito poder extraer archivos tar y descomprimirlos sin guardarlos en el disco. Esto parece fallar:

He intentado esto pero arroja errores:

# fetch.py from cStringIO import StringIO import requests url = "http://example.com/data.tar.gz" response = requests.get(url) # ERROR is thrown here. Error shown below tar = tarfile.open(mode= "r:gz", fileobj = StringIO(response.content)) # This SHOULD break as tar.extract() saves to disk. # Can''t tell because of error on previous line of code. data = tar.extract()

Como se describe en el bloque de código anterior, obtengo el siguiente rastreo en la línea de error:

Traceback (most recent call last): File "<input>", line 1, in <module> File "./importers/bestbuy_fetcher.py", line 23, in download_bestbuy_batch tar = tarfile.open(mode= "r:gz", fileobj = StringIO(response.content)) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tarfile.py", line 1662, in open return func(name, filemode, fileobj, **kwargs) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tarfile.py", line 1711, in gzopen **kwargs) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tarfile.py", line 1689, in taropen return cls(name, mode, fileobj, **kwargs) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tarfile.py", line 1568, in __init__ self.firstmember = self.next() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tarfile.py", line 2324, in next raise ReadError(str(e)) ReadError: invalid header

Puedes probar lo que hicimos al tratar con solicitudes + tar: Usa el | Modo para abrir el archivo. Consulte http://docs.python.org/library/tarfile.html#tarfile.open para obtener más información.

Puede ver el código en https://github.com/djeese/djeese-client/blob/master/djeese/commands/clonestatic.py#L53

Básicamente, abre el archivo tar utilizando tarfile.open(mode=''r|gz'', fileobj=response.raw) .

Eso funcionó maravillosamente para nosotros, y espero que también lo haga para usted.

Resulta que el problema era que el archivo " data.tar.gz no era un archivo tar. Solo un archivo comprimido con gzip. Así que lo resolví con:

# fetch.py from cStringIO import StringIO import gzip import requests # Called a ''tar'' file but actually a gzip file. @#$%!!! url = "http://example.com/data.tar.gz" response = requests.get(url) results = gzip.GzipFile(fileobj=StringIO(response.content))

¡Gracias a todos los que ayudaron a colaborar!

Sospecho que el error le está diciendo que el formato de archivo del archivo tar es incorrecto. Intente buscar el archivo con wget y anular la grabación en la línea de comandos.

La otra pregunta, acerca de cómo detener Python al escribir el contenido del archivo en el disco, requiere una mirada más cercana a la API del tarfile . En lugar de llamar a TarFile.extract() , creo que necesitas getnames() que devolverá el nombre de cada miembro en el archivo tar. Luego puede usar extractfile para obtener el contenido de ese miembro:

| extractfile(self, member) | Extract a member from the archive as a file object. `member'' may be | a filename or a TarInfo object. If `member'' is a regular file, a | file-like object is returned. If `member'' is a link, a file-like | object is constructed from the link''s target. If `member'' is none of | the above, None is returned. | The file-like object is read-only and provides the following | methods: read(), readline(), readlines(), seek() and tell()

Aquí hay un ejemplo:

import tarfile # Open tarfile tar = tarfile.open(mode="r:gz", fileobj = file(''foo.tgz'')) # Iterate over every member for member in tar.getnames(): # Print contents of every file print tar.extractfile(member).read()