scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

 问题描述:

python:3.6

ubantu:5.4.0-6ubuntu1~16.04.4

 在使用scrapy为框架,将采集到的数据使用pymysql保存到虚拟机中的时候,数据采集没有问题,但是在插入的时候出现了问题,报错如下:

Traceback (most recent call last):
  File "e:\anaconda3\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 36, in process_item
    self.update(item)
  File "E:\Scrapy\Jianshu\Jianshu\pipelines.py", line 31, in update
    self.cursor.execute(update_time)
  File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 170, in execute
    result = self._query(query)
  File "e:\anaconda3\lib\site-packages\pymysql\cursors.py", line 328, in _query
    conn.query(q)
  File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 516, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 727, in _read_query_result
    result.read()
  File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 1066, in read
    first_packet = self.connection._read_packet()
  File "e:\anaconda3\lib\site-packages\pymysql\connections.py", line 683, in _read_packet
    packet.check_error()
  File "e:\anaconda3\lib\site-packages\pymysql\protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "e:\anaconda3\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

表定义:

scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

代码:

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import pymysql


class XXXXPipeline(object):
    def __init__(self):
        self.conn = pymysql.connect("192.168.0.124", "root", "123456", "a")
        self.cursor = self.conn.cursor()
        self._sql = None
        self.num = 1

    @property
    def get_sql(self):
        if not self._sql:
            self._sql = sql = """insert into XXXX_save values({},'{}','{}','{}','{}','{}','{}')"""
        return self._sql

    def update(self, item):
        print("item['content']:")
        print(repr(item['content']))
        update_time = self.get_sql.format(self.num, item['title'],
                                          item['author'], item['author_img'],
                                          item['artical_id'],
                                          item['pub_time'], item['content'].replace("'",""))
        print("update_time:", update_time)
        self.cursor.execute(update_time)
        self.conn.commit()
        self.num += 1

    def process_item(self, item, spider):
        self.update(item)
        return item

原因分析:

原因在于当你执行爬虫文件的时候,向虚拟机中的mysql插入了数据,当你停掉了程序后在去执行,mysql中还保存上次采集的数据,定义的ID为主键,定义的时候并没有进行设置自增,而是通过spider文件进行传值,每次从1开始(这是个瑕疵,随后进行改正),也就是说每次在执行spider文件的时候,需要先清空数据库中的表。

scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")

在去执行spider文件。

运行成功,数据库中已经插入了数据

scrapy_pymysql.err.IntegrityError: (1062, "Duplicate entry '1' for key 'PRIMARY'")