Scrapy - 调用蜘蛛从其他脚本

问题描述：

我创造了这个类parse()方法：Scrapy - 调用蜘蛛从其他脚本

class PitchforkSpider(scrapy.Spider): 
    name = "pitchfork_reissues" 
    allowed_domains = ["pitchfork.com"] 
    #creates objects for each URL listed here 
    start_urls = [ 
        "http://pitchfork.com/reviews/best/reissues/?page=1", 
        "http://pitchfork.com/reviews/best/reissues/?page=2", 
        "http://pitchfork.com/reviews/best/reissues/?page=3", 
    ] 

    def parse(self, response): 

     items = [] 

     for sel in response.xpath('//div[@class="album-artist"]'): 
      item = PitchforkItem() 
      item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract() 
      item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract() 
      items.append(item) 

     return items

从另一个脚本，我导入module上述class属于其中：

from blogs.spiders.pitchfork_reissues_feed import *

并且实例化class，我尝试拨打parse()方法：

def reissues(): 

    pitchfork_reissues = PitchforkSpider() 
    albums = pitchfork_reissues.parse(response) 
    print (albums)

，但我得到了以下错误：

reissues = pitchfork_reissues.parse(response) 
NameError: global name 'response' is not defined

Aparently，该parse()方法需要scrapy.http.Response一个实例。 如何在reissues()的第二个脚本的上下文中创建这样的实例？

你如何使用您的第一个脚本中的“PitchforkSpider”类？ – njzk2

@ njzk2你是什么意思？你能否更具体一些？ –

你说'从另一个脚本'。我假设你有另一个脚本，成功地使用这个类？ – njzk2

答

from scrapy.http import Response 

response = Response(body=u'html here')

现在，我不认为你将能够抓取这种方式，因为它不是Scrapy是如何工作的，但你仍然可以创建响应对象

请注意我的编辑。你能否在上面的代码的上下文中添加你的答案，否则它是没用的。我只想返回爬到我的脚本中的项目。 –

Scrapy - 调用蜘蛛从其他脚本

相关推荐