检测是否在for-loop项目是产生项目时的最后一个项目?

问题描述:

我正在处理一个巨大的postgresql数据库,为此我创建了一个“fetch”函数。检测是否在for-loop项目是产生项目时的最后一个项目?

def fetch(cursor, batch_size=1e3): 
    """An iterator that uses fetchmany to keep memory usage down""" 
    while True: 
     records = cursor.fetchmany(int(batch_size)) 
     if not records: 
      break 
     for record in records: 
      yield record 

对于每一个项目,我做了一些处理,但现在我有地方在某些情况下,最后一个项目将是我做的项目之间的一些比较被忽略的一个问题。只要这个比较没有在最后一个项目上产生,就不会做任何事情。

connection = psycopg2.connect(<url>) 
cursor = connection.cursor() 

cursor.execute(<some query>) 

temp_today = 0 

for row in fetch(cursor): 
    item = extract_variables(row) 
    date = item['datetime'] 
    today = date.date() 
    if temp_today is 0: 
     # do something with first row 
     temp_today = date 
    # ----------------------------------------- 
    # I feel like I am missing a statement here 
    # something like: 
    # if row == rows[-1]: 
    #  do something with last row.. 
    # ----------------------------------------- 
    elif temp_today.date() == today: 
     # do something with every row where 
     # the date is the same 
    else: 
     # do something with every row where 
     # the dates ain't the same 

当我使用yield时,如何处理最后一个项目?

对于我来说使用yield是非常重要的,因为我正在处理一个非常庞大的数据集,并且如果我不处理这些数据集,我将耗尽内存。

+1

应该可以从光标获得结果集中的行数,对吧?然后,您可以将计数器(枚举)与该数字进行比较。 –

+1

'...因为我正在做一些项目比较'你可以在数据库中做到这一点(通过使用窗口函数,或通过一些自我加入) – wildplasser

感谢@Peter斯密特:

connection = psycopg2.connect(<url>) 
cursor = connection.cursor() 

cursor.execute(<some query>) 

temp_today = 0 
parsed_count = 0 
cursor_count = cursor.rowcount 

for row in fetch(cursor): 
    item = extract_variables(row) 
    date = item['datetime'] 
    today = date.date() 
    if temp_today is 0: 
     # do something with first row 
     temp_today = date 
    elif parsed_count == cursor_count: 
     # do something with the last row 
    elif temp_today.date() == today: 
     # do something with every row where 
     # the date is the same 
    else: 
     # do something with every row where 
     # the dates ain't the same 

可以定义另一个生成,因此您可以遍历项目,并返回前一个(如果有的话):从意见我用以下解决方案

def pair(sequence): 
    previous = None 
    for item in sequence: 
     yield (item, previous) 
     previous = item 

for item, previous_item in pair(mygenerator(args)) 
    if previous_item is None: 
     # process item: first one returned 
    else: 
     # you can compare item and previous_item