小计列表值重复,直到没有重复保留

问题描述:

我有一个Python中可能包含重复的数字列表。我需要对重复值进行小计处理,然后解压缩重复项以返回原始列表并跟踪每个小计中使用的值。我遇到了第一轮小计导致新副本必须小计的问题。例如,列表[10,10,20,50,50,75]应被细分为[40,100,75],因为重复的10s的小计导致新的重复的20s需要被小计。小计列表值重复,直到没有重复保留

我试过使用下面的代码来创建一个重复的字典并跟踪每个的出现次数,但是这种方法在这种情况下不起作用。

import collections 

def compress_dups(values): 
    compressed_indices = [] 
    for val in set(values): 
     indices = [i for i, x in enumerate(values) if x == val] 
     compressed_indices.append(indices) 
    return(compressed_indices) 

compress_dict = collections.OrderedDict() 
initial_list = [10, 10, 20, 50, 50, 75] 
compressed_list = [] 
g = compress_dups(initial_list) 

print(initial_list) 

for item in g: 
    compressed_list.append(len(item)*initial_list[min(item)]) 
    compress_dict[(len(item)*initial_list[min(item)])] = len(item) 

print(sorted(compressed_list)) #this is the subtotaled list I'll work with 

for k,v in reversed(compress_dict.items()): 
    del compressed_list[compressed_list.index(k)] 
    for x in xrange(v): 
     compressed_list.append(k/v) 

print(sorted(compressed_list)) # this is the list after it's unpacked 

所需的输出:

[10, 10, 20, 50, 50, 75] 
[40, 75, 100] 
[10, 10, 20, 50, 50, 75] 
+0

你为什么要做'min(item)'? –

+0

如果'[10,20,20,10]'应该输出什么? –

这里有一个简单的功能,我提出做你的任务:

def count(lst): 
    counter = [] 
    for e in sorted(lst): 
     if e in counter: 
      counter.remove(e) 
      counter.append(e*2) 
     else: 
      counter.append(e) 
    return counter # or return sorted(counter) if you want it to be sorted 

initial_list = [10, 10, 20, 50, 50, 75] 
print(count(initial_list)) # prints [40, 100, 75] or [40, 75, 100] if its sorted 

second_list = [5, 5, 10, 20, 40, 80] 
print(count(second_list)) # prints [160] 

third_list = [100, 50, 25, 25, 78] 
print(count(third_list)) # prints [78, 200] 

说明:该函数创建一个列表,然后用initial_list遍历检查每个值已经在新列表中,如果是,则将其从新列表中移除并附加值的两倍。如果不是,只需将该值添加到新列表中。然后它返回新的列表。

冗长,但工作,

def getNextLevel(a): 
    b = [] 
    visited= [] 
    found = False 
    for i in a: 
     if i not in visited: 
     c = a.count(i) 
     if c>1: 
      found = True 
     b.append(i*c) 
     visited.append(i) 
    return [b,found] 

if __name__ =='__main__': 
    a = [10, 10, 20, 50, 50, 75] 
    m = {} 
    level = 1 
    m[0] = a 
    while True: 
     [b,f] = getNextLevel(a) 
     if f: 
     m[level] = b 
     level +=1 
     else: 
     break 
     a = b 

    #print  
    for i in range(level): 
     print m[i] 

    for i in range(level-2,-1,-1): 
     print m[i] 

输出,

[10, 10, 20, 50, 50, 75] 
[20, 20, 100, 75] 
[40, 100, 75] 
[20, 20, 100, 75] 
[10, 10, 20, 50, 50, 75] 

可以使用这样的函数

def sum_dup(l): 

for i in range(len(l)-1): 
    if l[i] == l[i+1]: 
     l[i]+=l[i+1] 
     l[i+1] = 0 
     l.sort() 
l = list(set(l)) 
l.remove(0) 
return(l) 

sum_dup(l) 

回报

[40, 75, 100] 

我需要将重复值小计与它们一起使用,然后解压重复以返回到原始列表并跟踪每个小计中使用的值。

在Python 3,有可能产生两个结果:

# Python 3 
import collections as ct 

def compress(initial, saved=None): 
    """Yield a compressed list of summed repeated values and a Counter, else compress again.""" 
    c = ct.Counter(initial) 
    if saved is None: saved = c       # store starting Counter 
    if len(initial) == len(set(initial)): 
     yield initial 
     yield saved 
    else: 
     compressed = sorted(k*v for k, v in c.items()) 
     yield from compress(compressed, saved=saved) 

lst = [10, 10, 20, 50, 50, 75] 
tuple(compress(lst)) 
# ([40, 75, 100], Counter({10: 2, 20: 1, 50: 2, 75: 1})) 

在这里,我们得到两个压缩的列表,并开始Counter。注意:术语“compress”不等于itertools.compress。现在,我们可以通过遍历Counter恢复原来的列表:

clst, counter = tuple(compress(lst)) 
rlst = sorted(counter.elements())       # sorted(k for k, v in counter.items() for _ in range(v)) 

print("Original list :", lst) 
print("Counter  :", counter) 
print("Compressed list:", clst) 
print("Recovered list :", rlst) 
# Original list : [10, 10, 20, 50, 50, 75] 
# Counter  : Counter({10: 2, 50: 2, 75: 1, 20: 1}) 
# Compressed list: [40, 75, 100] 
# Recovered list : [10, 10, 20, 50, 50, 75] 

摘要:此示例使用递归,yield from和存储开始反追多元素的原始列表。它适用于多个重复项目,不仅重复。虽然慢了10倍,行为并行的@ abccd的测试,如果遇到更多重复的元素会有所不同:

lst = [10, 10, 10, 20, 50, 50, 75] 
tuple(compress(lst))[0] 
# [20, 30, 75, 100] 
+0

这个答案是有用的,它也适用于任何数字不止1个副本的场景。谢谢! – David