python - soup - Cómo leer el encabezado con pycurl

python soup find (4)

¿Cómo leo los encabezados de respuesta devueltos por una solicitud de PyCurl?

Esto puede o no ser una alternativa para usted:

import urllib headers = urllib.urlopen(''http://www.pythonchallenge.com'').headers.headers

Hay varias soluciones (por defecto, se descartan). Aquí hay un ejemplo usando la opción HEADERFUNCTION que le permite indicar una función para manejarlos.

Otras soluciones son las opciones WRITEHEADER (no compatible con WRITEFUNCTION) o establecer HEADER en True para que se transmitan con el cuerpo.

#!/usr/bin/python import pycurl import sys class Storage: def __init__(self): self.contents = '''' self.line = 0 def store(self, buf): self.line = self.line + 1 self.contents = "%s%i: %s" % (self.contents, self.line, buf) def __str__(self): return self.contents retrieved_body = Storage() retrieved_headers = Storage() c = pycurl.Curl() c.setopt(c.URL, ''http://www.demaziere.fr/eve/'') c.setopt(c.WRITEFUNCTION, retrieved_body.store) c.setopt(c.HEADERFUNCTION, retrieved_headers.store) c.perform() c.close() print retrieved_headers print retrieved_body

Otro alternativo, human_curio de uso: pip human_curl

In [1]: import human_curl as hurl In [2]: r = hurl.get("http://.com") In [3]: r.headers Out[3]: {''cache-control'': ''public, max-age=45'', ''content-length'': ''198515'', ''content-type'': ''text/html; charset=utf-8'', ''date'': ''Thu, 01 Sep 2011 11:53:43 GMT'', ''expires'': ''Thu, 01 Sep 2011 11:54:28 GMT'', ''last-modified'': ''Thu, 01 Sep 2011 11:53:28 GMT'', ''vary'': ''*''}

import pycurl from StringIO import StringIO headers = StringIO() c = pycurl.Curl() c.setopt(c.URL, url) c.setopt(c.HEADER, 1) c.setopt(c.NOBODY, 1) # header only, no body c.setopt(c.HEADERFUNCTION, headers.write) c.perform() print headers.getvalue()

Agregue cualquier otro setop de curvatura según sea necesario / deseado, como FOLLOWLOCATION.