Python列表过滤性能
我试图做Python列表过滤性能
In [21]: l1 = range(1,1000000)
In [22]: l2 = range(100,90000)
In [23]: l1.append(101)
In [24]: print(set([x for x in l1 if l1.count(x) - l2.count(x) == 1]))
在我的Python壳这需要年龄。一般来说,我的目标是从第二个减去一个列表,同时照顾重复。
e.g
[1,2,2,3] - [2,3] = [1,2]
我会为任何提示如何得到这个最大500毫秒做了常规的单核设备上非常高兴。
我了解到,合并模式是速度更快(读http://openbookproject.net/thinkcs/python/english3e/list_algorithms.html#alice-in-wonderland-again):
6 def substract_list(origin, substract):
7 """
8 example: db tells us that user bought shares 1, 2, 2, 3, 4 and also sold
9 1, 2, 2. we need to know that he now owns 3, 4 for 10m item
10
11 learned from here:
12 http://openbookproject.net/thinkcs/python/english3e/list_algorithms.html#alice-in-wonderland-again
13
14 lists need to be sorted and can contain duplicates
15 """
16 result = []
17 xi = 0 # @origin
18 yi = 0 # @subscract
19
20 len_substract = len(substract)
21 len_origin = len(origin)
22
23 while True:
24 # reached end of substract, append rest of origin, return
25 if yi >= len_substract:
26 result.extend(origin[xi:])
27 return result
28
29 # reached end of origin, return
30 if xi >= len_origin:
31 return result
32
33 # step throught the values
34 if origin[xi] == substract[yi]:
35 # equal values, next one pls
36 yi += 1
37 xi += 1
38 elif origin[xi] > substract[yi]:
39 yi += 1
40 else:
41 result.append(origin[xi])
42 xi += 1
计数器在我的机器上做到了:3-4s,合并模式:0.3s
非保序使用collections.Counter
:
from collections import Counter
a = Counter([1, 2, 2, 3])
b = Counter([2, 3])
res = list(a - b)
# [1, 2]
此操作,因为一个Counter
的-
方法移除从输出其中b
存在的计数大于在a
计数等于或大于任何元件。
订购使用OrderedCounter
保持,然后手动生成列表,例如:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
a = OrderedCounter([3, 2, 2, 1])
b = Counter([2, 3])
res = [k for k, v in a.items() if v - b[k] > 0]
# [2, 1]
最后,如果原来的范围包含非唯一值,以及你想要的元素重复的次数是减法,那么后遗留下来的:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
a = OrderedCounter([3, 3, 2, 2, 2, 1])
b = Counter([2, 3])
res = [k for k, v in a.items() for _ in range(v - b[k])]
# [3, 2, 2, 1]
您击败了我:) – DeepSpace
我不关心订单。所以我验证了你的第一个方法,并在2015年macbook pro上使用git 1.6s渲染时间。 thx为快速和完美的帮助! – patroqueeet
我想我会用Counter
,然后将其转换减去两个计数器来排序列表的结果:
from collections import Counter
l1 = [1, 2, 2, 3]
l2 = [2, 3]
counter_l1 = Counter(l1)
counter_l2 = Counter(l2)
res = counter_l1 - counter_l2
print(sorted(res))
>> [1, 2]
第一计数的元素:
from collections import defaultdict
count1 = defaultdict(int)
count2 = defaultdict(int)
for x in l1:
count1[x] += 1
for x in l2:
count2[x] += 1
print([x for x, count in count1.iteritems() if count - count2[x] == 1])
(不要忘记l1
转换为一组的最后一行)
上面的代码需要625ms我的机器上。 (不打印结果到stdout)
请注意:元素可能会在l1中出现多次。或者,我们可以迭代count1键。其实,这是一个更好的主意,我会编辑我的答案。 –
当你拥有'count1'和'count2'后,你不再需要担心原来的列表...... –
是的,这是正确的。原始列表仅用于维护订单。 –
执行时间:
import time
from functools import wraps
from collections import defaultdict
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):pass
def tefn(fn):
t = time.time()
set(fn())
return t - time.time()
def efn(fn):
t = time.time()
fn()
return t - time.time()
def runTime(tp=0, count=10):
def dec(fn):
@wraps(fn)
def wrap():
print(fn)
res = tefn if tp else efn
return sum(res(fn) for _ in range(count))/count
return wrap
return dec
@runTime(tp=1)
def fnList():
return [x for x in l1 if l1.count(x) - l2.count(x) == 1]
@runTime(tp=1)
def fnIter():
return iter(x for x in l1 if l1.count(x) - l2.count(x) == 1)
@runTime(tp=1)
def fnYield():
for x in l1:
if l1.count(x) - l2.count(x) == 1:
yield x
@runTime()
def fnCounter():
a = Counter(l1)
b = Counter(l2)
return list(a - b)
@runTime()
def fnOrderedCounter():
a = OrderedCounter(l1)
b = Counter(l2)
return [k for k, v in a.items() if v - b[k] > 0]
@runTime()
def fnDefaultdict():
count1 = defaultdict(int)
count2 = defaultdict(int)
for x in l1: count1[x] += 1
for x in l2: count2[x] += 1
return [x for x, count in count1.items() if count - count2[x] == 1]
if __name__ == '__main__':
l1 = range(1, 1000000)
l2 = range(100, 90000)
g = globals()
result = list((fn.__name__, fn()) for fn in (g[f] for f in g if f.startswith('fn')))
result.sort(key=lambda r: r[1], reverse=True)
for e, r in enumerate(result):
print(e, r)
OUT:
<function fnList at 0x02F17540>
<function fnIter at 0x034C8DF8>
<function fnCounter at 0x034C8F18>
<function fnDefaultdict at 0x034CB078>
<function fnYield at 0x034C8E88>
<function fnOrderedCounter at 0x034C8FA8>
0 ('fnYield', -0.8542306900024415)
1 ('fnList', -0.8605266094207764)
2 ('fnIter', -0.8655695915222168)
3 ('fnDefaultdict', -1.054802918434143)
4 ('fnCounter', -1.3413111925125123)
5 ('fnOrderedCounter', -5.433168196678162)
亲爱的,我不确定你的代码。介意添加一些词语来解释你的方法和结果? – patroqueeet
你关心结果列表中的元素排序吗? – soon
检查:http://*.com/questions/4211209/remove-all-the-elements-that-occur-in-one-list-from-another – pathoren
你是否只想保留一次显示的元素l1比l2?如果是这样,那么在l1中为每个事件保留这个元素? – Moberg