在数据库中重复的条目python sqlite3

问题描述：

我正在阅读使用python的news.ycombinator RSS提要并使用sqlite3将它们存储在数据库中。例如，在被馈送到数据库的样本条目（ID，标题，UR）在数据库中重复的条目python sqlite3

('3814508', 'Github is making me feel stupid(er)', 'http://www.serpentine.com/blog/2012/04/08/github-is-making-me-feel-stupider/')

其中ID是在他们的评论跟帖URL中使用该网站的评论的ID。上面的列表是通过分别提取ids,title和url然后压缩它们来完成的。现在我希望用这样的条目填充一个数据库sans duplication，

import sqlite3 as lite 

con = lite.connect('/path/to/rss.db') 
con.text_factory = str 
cur=con.cursor() 
# --- Extract ids, links, urls ---- 
zipped = zip(ids, titles, targets) 
cur.execute("SELECT Id FROM Posts") 
existing_ids = cur.fetchall() 
for i in range(0,len(zipped)): 
    if ids[i] not in existing_ids: 
     cur.executemany("INSERT INTO Posts VALUES(?, ?, ?)", zipped)

问题是一次有三十个提要。打印清单显示正常行为，30个样本。但是，当我试图写数据库有相当大数量的条目，在同样的35项重复过去，在表中“帖子”

sqlite> SELECT Count(*) FROM Posts; 
930

的数据块具有模式超过31倍CREATE TABLE Posts(Id TEXT, Title TEXT, Target TEXT);

答

看起来你一旦找到缺失的东西，就会插入所有条目的整个列表？也许你的意思是遍历每个压缩的元组，并检查每个元组是否已经存在？

在数据库中重复的条目python sqlite3

相关推荐