功能不能正确返回列表
问题描述:
我写了一个代码,用于添加两个不同文本文件中的数字。对于非常大的数据2-3 GB,我得到了MemoryError。所以,我正在使用一些函数编写一个新代码,以避免将整个数据加载到内存中。功能不能正确返回列表
这段代码打开一个输入文件“d.txt”的读出更大的数据的某些行之后的数字如下:
SCALAR
ND 3
ST 0
TS 1000
1.0
1.0
1.0
SCALAR
ND 3
ST 0
TS 2000
3.3
3.4
3.5
SCALAR
ND 3
ST 0
TS 3000
1.7
1.8
1.9
并增加了数从较小的文本文件已阅读的“e .TXT”如下:
SCALAR
ND 3
ST 0
TS 0
10.0
10.0
10.0
的结果写入一个文本文件 'output.txt的' 这样的:
SCALAR
ND 3
ST 0
TS 1000
11.0
11.0
11.0
SCALAR
ND 3
ST 0
TS 2000
13.3
13.4
13.5
SCALAR
ND 3
ST 0
TS 3000
11.7
11.8
11.9
,我编写的代码:
def add_list_same(list1, list2):
"""
list2 has the same size as list1
"""
c = [a+b for a, b in zip(list1, list2)]
print(c)
return c
def list_numbers_after_ts(n, f):
result = []
for line in f:
if line.startswith('TS'):
for node in range(n):
result.append(float(next(f)))
return result
def writing_TS(f1):
TS = []
ND = []
for line1 in f1:
if line1.startswith('ND'):
ND = float(line1.split()[-1])
if line1.startswith('TS'):
x = float(line1.split()[-1])
TS.append(x)
return TS, ND
with open('d.txt') as depth_dat_file, \
open('e.txt') as elev_file, \
open('output.txt', 'w') as out:
m = writing_TS(depth_dat_file)
print('number of TS', m[1])
for j in range(0,int(m[1])-1):
i = m[1]*j
out.write('SCALAR\nND {0:2f}\nST 0\nTS {0:2f}\n'.format(m[1], m[0][j]))
list1 = list_numbers_after_ts(int(m[1]), depth_dat_file)
list2 = list_numbers_after_ts(int(m[1]), elev_file)
Eh = add_list_same(list1, list2)
out.writelines(["%.2f\n" % item for item in Eh])
的output.txt的是这样的:
SCALAR
ND 3.000000
ST 0
TS 3.000000
SCALAR
ND 3.000000
ST 0
TS 3.000000
SCALAR
ND 3.000000
ST 0
TS 3.000000
添加列表不工作,除了我单独检查的功能,他们的工作。我没有发现错误。我改变了很多,但它不起作用。任何建议?我非常感谢您提供的任何帮助!
答
您可以使用grouper
通过固定行数读取文件。如果组中的行顺序保持不变,则下一个代码应该可以工作。
from itertools import zip_longest
#Split by group iterator
#See http://*.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
def grouper(iterable, n, padvalue=None):
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
add_numbers = []
with open("e.txt") as f:
# Read data by 7 lines
for lines in grouper(f, 7):
# Suppress first SCALAR line
for line in lines[1:]:
# add last number in every line to array (6 elements)
add_numbers.append(float(line.split()[-1].strip()))
#template for every group
template = 'SCALAR\nND {:.2f}\nST {:.2f}\nTS {:.2f}\n{:.2f}\n{:.2f}\n{:.2f}\n'
with open("d.txt") as f, open('output.txt', 'w') as out:
# As before
for lines in grouper(f, 7):
data_numbers = []
for line in lines[1:]:
data_numbers.append(float(line.split()[-1].strip()))
# in result_numbers sum elements of two arrays by pair (6 elements)
result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
# * unpack result_numbers as 6 arguments of function format
out.write(template.format(*result_numbers))
答
我不得不改变代码中的一些小东西,现在,它的作品,但只是很小的输入文件,因为很多变量被加载到内存中。你能告诉我,我怎样才能以良率工作?
from itertools import zip_longest
def grouper(iterable, n, padvalue=None):
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def writing_ND(f1):
for line1 in f1:
if line1.startswith('ND'):
ND = float(line1.split()[-1])
return ND
def writing_TS(f):
for line2 in f:
if line2.startswith('TS'):
x = float(line2.split()[-1])
TS.append(x)
return TS
TS = []
ND = []
x = 0.0
n = 0
add_numbers = []
with open("e.txt") as f, open("d.txt") as f1,\
open('output.txt', 'w') as out:
ND = writing_ND(f)
TS = writing_TS(f1)
n = int(ND)+4
f.seek(0)
for lines in grouper(f, int(n)):
for item in lines[4:]:
add_numbers.append(float(item))
i = 0
for l in grouper(f1, n):
data_numbers = []
for line in l[4:]:
data_numbers.append(float(line.split()[-1].strip()))
result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
del data_numbers
out.write('SCALAR\nND %d\nST 0\nTS %0.2f\n' % (ND, TS[i]))
i += 1
for item in result_numbers:
out.write('%s\n' % item)