Scrapy CSV输出重复字段

问题描述：

我有一只蜘蛛（下图），我希望能够每10天左右通过一次Cron作业来运行它，但是，每次我第一次运行它时都会运行它。它重写字段，而不是仅将项目追加到CSV中的相应字段。我该如何做到这一点，以便无论我运行多少次，顶部只有一组字段标题，并且下面的所有数据都包含在其中。Scrapy CSV输出重复字段

import scrapy 

class Wotd(scrapy.Item): 
    word = scrapy.Field() 
    definition = scrapy.Field() 
    sentence = scrapy.Field() 
    translation = scrapy.Field() 


class WotdSpider(scrapy.Spider): 
    name = 'wotd' 
    allowed_domains = ['www.spanishdict.com/wordoftheday'] 
    start_urls = ['http://www.spanishdict.com/wordoftheday/'] 
    custom_settings = { 
     #specifies exported fields and their order 
    'FEED_EXPORT_FIELDS': ['word','definition','sentence','translation'] 
    } 

def parse(self, response): 
    jobs = response.xpath('//div[@class="sd-wotd-text"]') 
    for job in jobs: 
     item = Wotd() 
     item['word'] = job.xpath('.//a[@class="sd-wotd-headword-link"]/text()').extract_first() 
     item['definition'] = job.xpath('.//div[@class="sd-wotd-translation"]/text()').extract_first() 
     item['sentence'] = job.xpath('.//div[@class="sd-wotd-example-source"]/text()').extract_first() 
     item['translation'] = job.xpath('.//div[@class="sd-wotd-example-translation"]/text()').extract_first() 
     yield item

从我一直在阅读上Scrapy文档，它看起来像我可以与CsvItemExporter类有勾搭，并设置include_headers_line =假，但我不知道在哪里添加类在项目结构。

答

首先，您没有分享您当前的项目结构，因此很难告诉您将具体示例放在哪里。

我们假设您的项目名为my_project。在主项目目录（其中包含settings.py的一个），创建文件exporters.py与此内容：

import scrapy.exporters 

class NoHeaderCsvItemExporter(scrapy.exporters.CsvItemExporter): 
    def __init__(self, file, join_multivalued=', ', **kwargs): 
     super(NoHeaderCsvItemExporter, self).__init__(file=file, include_headers_line=False, join_multivalued=join_multivalued, **kwargs)

类从标准CSV出口NoHeaderCsvItemExporter继承，只是指定了我们不希望在输出中包含标题行。

接下来，您必须为CSV格式指定新的导出器类，可以在settings.py或spider的custom_settings中指定。按照你目前与后来的选项办法，那就是：

custom_settings = { 
    'FEED_EXPORT_FIELDS': ['word','definition','sentence','translation'], 
    'FEED_EXPORTERS': { 
     'csv': 'my_project.exporters.NoHeaderCsvItemExporter', 
    } 
}

要知道，使用这个类，不会有包含在CSV 任何标题行，甚至不是第一个出口。

谢谢，这正是我一直在寻找的。我在没有更改的情况下运行了一次，以便设置标题，然后进行更改并像魅力一样工作。谢谢你的帮助！ – GainesvilleJesus

Scrapy CSV输出重复字段

相关推荐