scraping example espaƱol python web-crawler scrapy scrapy-spider

example - Error de python de Scrapy-Falta el esquema en la URL de solicitud



scrapy vs beautifulsoup (1)

Necesita agregar esquema para la URL:

ftp://ftp.site.co.uk

La sintaxis de FTP URL se define como:

ftp://[<user>[:<password>]@]<host>[:<port>]/<url-path>

Básicamente, haces esto:

yield Request(''ftp://ftp.site.co.uk/feed.xml'', ...)

Lea más sobre los esquemas en Wikipedia: http://en.wikipedia.org/wiki/URI_scheme

Estoy tratando de extraer un archivo de un servidor FTP protegido con contraseña. Este es el código que estoy usando:

import scrapy from scrapy.contrib.spiders import XMLFeedSpider from scrapy.http import Request from crawler.items import CrawlerItem class SiteSpider(XMLFeedSpider): name = ''site'' allowed_domains = [''ftp.site.co.uk''] itertag = ''item'' def start_requests(self): yield Request(''ftp.site.co.uk/feed.xml'', meta={''ftp_user'': ''test'', ''ftp_password'': ''test''}) def parse_node(self, response, selector): item = CrawlerItem() item[''title''] = (selector.xpath(''//title/text()'').extract() or [''''])[0] return item

Este es el error de rastreo que obtengo:

Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run self.mainLoop() File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainLoop self.runUntilCurrent() File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilC urrent call.func(*call.args, **call.kw) File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 41, in __call__ return self._func(*self._a, **self._kw) --- <exception caught here> --- File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 112, in _next_reques t request = next(slot.start_requests) File "/var/www/spider/crawler/spiders/site.py", line 13, in start_requests meta={''ftp_user'': ''test'', ''ftp_password'': ''test''}) File "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 26, in __i nit__ self._set_url(url) File "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/__init__.py", line 61, in _se t_url raise ValueError(''Missing scheme in request url: %s'' % self._url) exceptions.ValueError: Missing scheme in request url: ftp.site.co.uk/f eed.xml