Scrapy通过从CSV文件中额外的数据成解析

问题描述:

我scrapy蜘蛛看起来通过CSV文件并运行start_urls,象这样的csv文件地址:Scrapy通过从CSV文件中额外的数据成解析

from csv import DictReader 
    with open('addresses.csv') as rows: 
    start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)] 

但.csv文件还包含电子邮件和其他信息。我如何将这些额外的信息传递到解析中以将其添加到新文件中?

import scrapy 
from csv import DictReader 

with open('addresses.csv') as rows: 
    names=[row["Name"].replace(',','') for row in DictReader(rows)] 
    emails=[row["Email"].replace(',','') for row in DictReader(rows)] 
    start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)] 

def parse(self,response): 
    yield{ 
    'name': FROM CSV, 
    'email': FROM CSV, 
    'address' FROM SCRAPING: 
    'city' FROM SCRAPING: 
    } 
+0

有关详细信息,请参阅我的答案中的最新编辑。 – Umair

import scrapy 
from csv import DictReader 

class MySpider(scrapy.Spider): 

    def start_requests(self): 

     with open('addresses.csv') as rows: 

      for row in DictReader(rows): 

       name=row["Name"].replace(',','') 
       email=row["Email"].replace(',','') 

       link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') 

       yield Request(url = link, callback = self.parse, method = "GET", meta={'name':name, 'email':email}) 


    def parse(self,response): 
     yield{ 
     'name': resposne.meta['name'], 
     'email': respose.meta['email'], 
     'address' FROM SCRAPING: 
     'city' FROM SCRAPING: 
     } 
  • 打开CSV文件。
  • start_requests方法中迭代它。
  • 传递参数给回调函数,使用meta变量,可以通过meta传递一个Python字典。

注: 记住start_requests不是一个自定义的方法,它的Python Scrapy的方法。请参阅https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests