生成号码列表
问题描述:
嗨我想要生成从1000000到2000000数字列表,但问题是,我得到一个错误记忆错误,我正在使用随机一切都很好只有我得到dublcated号码,我不能重复数字,所以我不能切换到XRANGE生成号码列表
data = []
total = 2000000
def resource_file(info):
with open(info, "r") as data_file:
reader = csv_reader(data_file, delimiter=",")
for row in reader:
try:
for i in xrange(1000000,total):
new_row = [row[0], row[1], i]
data.append(new_row)
except IndexError as error:
print(error)
with open(work_dir + "new_data.csv", "w") as new_data:
writer = csv_writer(new_data, delimiter=",")
for new_row in data:
writer.writerow(new_row)
答
重复使用一个额外的列范围1M..2M
问题的每一行是你第一次所有这些配置存储在内存中。 Python的第一个没有一个非常有效的内存模型,而且每行还有一百万个条目非常大。
我建议不要保存在一个列表中的数据,而只是写这些,立即文件:要采取线
total = 2000000
def resource_file(info):
with open(info, "r") as data_file:
reader = csv_reader(data_file, delimiter=",")
with open(work_dir + "new_data.csv", "w") as new_data:
writer = csv_writer(new_data, delimiter=",")
for row in reader:
rowa, rowb = row[0:2]
for data in xrange(1000000,total):
writer.writerow([rowa,rowb,data])
采取行1M-2M的文件
万一1M到原始文件的2M,你可以写为:
from itertools import islice
total = 2000000
def resource_file(info):
with open(info, "r") as data_file:
reader = csv_reader(data_file, delimiter=",")
with open(work_dir + "new_data.csv", "w") as new_data:
writer = csv_writer(new_data, delimiter=",")
for row in islice(reader,1000000,total):
writer.writerow(row)
,或者你可以把它简化,像@JonClemens说,有:
from itertools import islice
total = 2000000
def resource_file(info):
with open(info, "r") as data_file:
reader = csv_reader(data_file, delimiter=",")
with open(work_dir + "new_data.csv", "w") as new_data:
writer = csv_writer(new_data, delimiter=",")
writer.writerows(islice(reader,1000000,total))
您正在尝试将所有内容存储在内存中,然后再写出任何内容。您可以通过一次只处理一行来使用较少的内存,而不是尝试将整个文件存储在内存中。 –
您确定要创建比输入CSV文件多1000000倍的元素吗?期望的结果是什么?你能给出一个小例子的CSV文件,你期望得到的CSV文件看起来像什么? – trincot
我想为行号2 – Mike