终端服务器
问题描述:
上运行Scrapy我想爬在服务器端,但我的Python它不是那么好...终端服务器
我的源是这么好,如果我运行它mylaptop终端上,但在运行时,它会错它的服务器端
这里我的源代码上
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from thehack.items import NowItem
import time
class MySpider(BaseSpider):
name = "nowhere"
allowed_domains = ["n0where.net"]
start_urls = ["https://n0where.net/"]
def parse(self, response):
for article in response.css('.loop-panel'):
item = NowItem()
item['title'] = article.css('.article-title::text').extract_first()
item['link'] = article.css('.loop-panel>a::attr(href)').extract_first()
item['body'] ='' .join(article.css('.excerpt p::text').extract()).strip()
#date ga kepake
#item['date'] = article.css('[itemprop="datePublished"]::attr(content)').extract_first()
yield item
time.sleep(5)
错行说
ERROR: Spider error processing <GET https://n0where.net/>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 638, in _tick
taskObj._oneWorkUnit()
File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
result = next(self._iterator)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 96, in iter_errback
yield next(it)
File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output
for x in result:
File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or())
File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
return (r for r in result or() if _filter(r))
File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
return (r for r in result or() if _filter(r))
File "/home/admin/nowhere/thehack/spiders/thehack_spider.py", line 14, in parse
item['title'] = article.css('.article-title::text').extract_first()
exceptions.AttributeError: 'SelectorList' object has no attribute 'extract_first'
有没有人知道如何解决它的队友? 非常感谢:)
答
似乎你的scrapy版本已过时。 scrapy.Selector
方法.extract_first()
仅在scrapy 1.1中添加,因此您希望升级服务器上的scrapy包。
队友我尝试_sudo PIP安装--upgrade scrapy_ 找来导致 **回滚LXML 命令的卸载 “的/ usr/bin中/ Python的-u -c” 进口setuptools的,记号化; __ __文件='/ tmp/pip-build-WJUVpy/lxml/setup.py'; exec(compile(getattr(tokenize,'open',open)(__ file __).read().export('\ r \ n','\ n' ),__file__,'exec'))“install --record /tmp/pip-yn9nU9-record/install-record.txt --single-version-external-managed -compile”失败,错误代码1在/ tmp/pip-build-WJUVpy/lxml/** 你能给我另一个建议队友吗? – jethow
@jethow什么是你的服务器上运行的发行版?在Ubuntu上,你可以尝试'apt install python-scrapy',版本1.1应该在ubuntu的版本库中。 – Granitosaurus
我使用的是Ubuntu 14.04.4 mate,我试过它... 它说_E:无法找到包Scrapy_ – jethow