环路执行功能
下破坏我有代码如下:环路执行功能
def scrapeFacebookPageFeedStatus(access_token):
query = "SELECT page_id FROM falken"
result_list = c.execute(query)
for single_row in result_list:
str_single_row = str(single_row)
str_norm_single_row = str_normalize(str_single_row)
print(str_norm_single_row)
当我执行上面的代码,它从result_list显示每SINGLE_ROW值。
但是当我通过SINGLE_ROW到功能象下面这样:
def scrapeFacebookPageFeedStatus(access_token):
query = "SELECT page_id FROM falken"
result_list = c.execute(query)
for single_row in result_list:
str_single_row = str(single_row)
str_norm_single_row = str_normalize(str_single_row)
print(str_norm_single_row)
statuses = getFacebookPageFeedData(str_norm_single_row, access_token, 100)
for status in statuses['data']:
query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
c.execute(query,(processFacebookPageFeedStatus(status, access_token)))
conn.commit()
它仅通过的SINGLE_ROW给函数的第一值和所述循环停止。
getFacebookPageFeedData功能
def getFacebookPageFeedData(page_id, access_token, num_statuses):
base = "https://graph.facebook.com/v2.6"
node = "/%s/posts" % page_id
fields = "/?fields=message,link,created_time,type,name,id," + \
"comments.limit(0).summary(true),shares,reactions" + \
".limit(0).summary(true)"
parameters = "&limit=%s&access_token=%s" % (num_statuses, access_token)
url = base + node + fields + parameters
# retrieve data
data = json.loads(request_until_succeed(url))
return data
它从页的帖子Facebook的图形API检索数据。
processFacebookPageFeedStatus功能
def processFacebookPageFeedStatus(status, access_token):
status_id = status['id']
status_message = '' if 'message' not in status.keys() else \
unicode_normalize(status['message'])
link_name = '' if 'name' not in status.keys() else \
unicode_normalize(status['name'])
status_type = status['type']
status_link = '' if 'link' not in status.keys() else \
unicode_normalize(status['link'])
status_published = datetime.datetime.strptime(
status['created_time'],'%Y-%m-%dT%H:%M:%S+0000')
status_published = status_published + \
datetime.timedelta(hours=-5) # EST
status_published = status_published.strftime(
'%Y-%m-%d %H:%M:%S')
num_reactions = 0 if 'reactions' not in status else \
status['reactions']['summary']['total_count']
num_comments = 0 if 'comments' not in status else \
status['comments']['summary']['total_count']
num_shares = 0 if 'shares' not in status else status['shares']['count']
reactions = getReactionsForStatus(status_id, access_token) if \
status_published > '2016-02-24 00:00:00' else {}
num_likes = 0 if 'like' not in reactions else \
reactions['like']['summary']['total_count']
num_likes = num_reactions if status_published < '2016-02-24 00:00:00' \
else num_likes
IT卖场需要从状态字典,并将其存储到变量数据插入到数据库中。
sqlite的cursor.execute()
返回光标本身。所以这行之后:
result_list = c.execute(query)
result_list
实际上是c
的别名。
现在你开始遍历c
:
for single_row in result_list:
# code here
,然后再次调用c.execute()
:
query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
c.execute(query,(processFacebookPageFeedStatus(status, access_token)))
其丢弃c
的前面的结果与这个新的查询结果集。由于此查询不会选择任何内容,因此c
将变为空的迭代器,并且您的循环停在那里。
固化是简单明了:使用另一个光标插入查询,这样你就不会覆盖c
的结果集:
# create a second cursor for insert statements
writer = conn.cursor()
# no need to recreate this same string anew for each iteration,
# we can as well define it here once for all
insert_query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
# no need for result_list - just iterate over `c`
c.execute(query)
for single_row in c:
# code here
writer.execute(insert_query,(processFacebookPageFeedStatus(status, access_token)))
作为一个侧面说明,如果性能是一个问题,你可能还希望在整个循环之后提交一次,而不是在每个插入语句之后提交。
这工作。非常感谢你。 – jeremybcenteno
@JeremyCenteno很高兴我能帮上忙 - 请不要忘记接受答案吧) –
什么是“c”? –
请修复您的代码缩进。 – languitar
@brunodesthuilliers c是一个sqlite游标 – jeremybcenteno