sitemapspider requests mercado libre scrapy screen-scraping benchmarking

requests - Errores de comando de Scrapy Bench/Benchmark



scrapy sitemapspider (1)

Necesitarás instalar el paquete cffi python pero antes necesitarás instalar ffi , que es libffi-dev y libffi en Ubuntu:

sudo aptitude install libffi-dev libffi

sudo pip install cffi

También necesitará instalar libssl-dev porque se usa en el paquete python de cryptography .

Después de eso, debes reinstalar el scrapy usando: sudo pip install scrapy --upgrade

Si no resuelve el problema, instale la última versión de scrapy de github, tarball:

https://github.com/scrapy/scrapy/tarball/master

Funcionó para mí ...

Instalé Scrapy 0.22.2 y pude ejecutar el ejemplo de código DirBot sin problemas. Sin embargo, cuando ejecuto el comando Bench, obtengo algunos errores y excepciones. ¿Hay algún problema por debajo del puerto 8998 que no acepte conexiones?

C:/>scrapy bench Traceback (most recent call last): File "C:/Python27/lib/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:/Python27/lib/runpy.py", line 72, in _run_code exec code in run_globals File "C:/Python27/lib/site-packages/scrapy-0.22.2-py2.7.egg/scrapy/tests/mocks erver.py", line 198, in <module> os.path.join(os.path.dirname(__file__), ''keys/cert.pem''), File "C:/Python27/lib/site-packages/twisted/internet/ssl.py", line 70, in __in it__ self.cacheContext() File "C:/Python27/lib/site-packages/twisted/internet/ssl.py", line 79, in cach eContext ctx.use_certificate_file(self.certificateFileName) OpenSSL.SSL.Error: [(''system library'', ''fopen'', ''No such process''), (''BIO routin es'', ''FILE_CTRL'', ''system lib''), (''SSL routines'', ''SSL_CTX_use_certificate_file'' , ''system lib'')] 2014-04-07 14:30:39-0500 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapybot) 2014-04-07 14:30:39-0500 [scrapy] INFO: Optional features available: ssl, http11 2014-04-07 14:30:39-0500 [scrapy] INFO: Overridden settings: {''CLOSESPIDER_TIMEO UT'': 10, ''LOG_LEVEL'': ''INFO'', ''LOGSTATS_INTERVAL'': 1} 2014-04-07 14:30:40-0500 [scrapy] INFO: Enabled extensions: LogStats, TelnetCons ole, CloseSpider, WebService, CoreStats, SpiderState 2014-04-07 14:30:42-0500 [scrapy] INFO: Enabled downloader middlewares: HttpAuth Middleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, Def aultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, Redirec tMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2014-04-07 14:30:42-0500 [scrapy] INFO: Enabled spider middlewares: HttpErrorMid dleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddlew are 2014-04-07 14:30:42-0500 [scrapy] INFO: Enabled item pipelines: 2014-04-07 14:30:42-0500 [follow] INFO: Spider opened 2014-04-07 14:30:42-0500 [follow] INFO: Crawled 0 pages (at 0 pages/min), scrape d 0 items (at 0 items/min) 2014-04-07 14:30:43-0500 [follow] INFO: Crawled 0 pages (at 0 pages/min), scrape d 0 items (at 0 items/min) 2014-04-07 14:30:44-0500 [follow] INFO: Crawled 0 pages (at 0 pages/min), scrape d 0 items (at 0 items/min) 2014-04-07 14:30:45-0500 [follow] INFO: Crawled 0 pages (at 0 pages/min), scrape d 0 items (at 0 items/min) 2014-04-07 14:30:45-0500 [follow] ERROR: Error downloading <GET http://localhost :8998/follow?total=100000&order=rand&maxlatency=0.0&show=20>: Connection was ref used by other side: 10061: No connection could be made because the target machin e actively refused it.. 2014-04-07 14:30:45-0500 [follow] INFO: Closing spider (finished) 2014-04-07 14:30:45-0500 [follow] INFO: Dumping Scrapy stats: {''downloader/exception_count'': 3, ''downloader/exception_type_count/twisted.internet.error.ConnectionRefus edError'': 3, ''downloader/request_bytes'': 783, ''downloader/request_count'': 3, ''downloader/request_method_count/GET'': 3, ''finish_reason'': ''finished'', ''finish_time'': datetime.datetime(2014, 4, 7, 19, 30, 45, 575000), ''log_count/ERROR'': 1, ''log_count/INFO'': 10, ''scheduler/dequeued'': 3, ''scheduler/dequeued/memory'': 3, ''scheduler/enqueued'': 3, ''scheduler/enqueued/memory'': 3, ''start_time'': datetime.datetime(2014, 4, 7, 19, 30, 42, 439000)} 2014-04-07 14:30:45-0500 [follow] INFO: Spider closed (finished)