scrapear - python scraping ejemplo

¿Cómo deshacerse de la advertencia del usuario BeautifulSoup? (3)

Después de instalar BeautifulSoup, siempre que ejecuto mi Python en cmd, aparece esta advertencia.

D:/Application/python/lib/site-packages/beautifulsoup4-4.4.1-py3.4.egg/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I''m using the best available HTML parser for this system ("html.parser"). This usually isn''t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this: BeautifulSoup([your markup]) to this: BeautifulSoup([your markup], "html.parser")

No tengo un ideal por qué sale y cómo resolverlo.

La documentación recomienda que instale y use lxml para la velocidad.

BeautifulSoup(html, "lxml")

Si está utilizando una versión de Python 2 anterior a 2.7.3, o una versión de Python 3 anterior a 3.2.2, es esencial que instale lxml o html5lib: el analizador HTML incorporado de Python no es muy bueno en versiones anteriores versiones

Instalar el analizador LXML

En Ubuntu (Debian)
apt-get install python-lxml
Fedora (basado en RHEL)
dnf install python-lxml
Usando PIP
pip install lxml

La solución a su problema se indica claramente en el mensaje de error. El código como el siguiente no especifica un XML / HTML / etc. analizador

BeautifulSoup( ... )

Para corregir el error, deberá especificar qué analizador le gustaría usar, así:

BeautifulSoup( ..., "html.parser" )

También puede instalar un analizador de terceros si lo desea.

Para el analizador de HTML, necesita instalar html5lib, ejecute:

pip install html5lib

a continuación, agregue html5lib en el método BeautifulSoup:

htmlDoc = bs4.BeautifulSoup(req1.text, ''html5lib'') print(htmlDoc)