产生多个进程来编写不同的文件Python
这个想法是使用N
进程编写N
文件。产生多个进程来编写不同的文件Python
要写入的对文件数据从存储在具有列表作为值的字典多个文件来了,它看起来像这样:
dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'],
'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'],
...
'fileN':['dataN1.txt', 'dataN2.txt', ..., 'dataNM.txt']}
所以file1
是data11 + data12 + ... + data1M
等。 。
所以我的代码如下所示:
jobs = []
for d in dic:
outfile = str(d)+"_merged.txt"
with open(outfile, 'w') as out:
p = multiprocessing.Process(target = merger.merger, args=(dic[d], name, out))
jobs.append(p)
p.start()
out.close()
和merger.py看起来是这样的:
def merger(files, name, outfile):
time.sleep(2)
sys.stdout.write("Merging %n...\n" % name)
# the reason for this step is that all the different files have a header
# but I only need the header from the first file.
with open(files[0], 'r') as infile:
for line in infile:
print "writing to outfile: ", name, line
outfile.write(line)
for f in files[1:]:
with open(f, 'r') as infile:
next(infile) # skip first line
for line in infile:
outfile.write(line)
sys.stdout.write("Done with: %s\n" % name)
我看到文件夹上应该写的文件,但它是空的。没有头,没有什么。我把印刷品放在那里看是否一切正常,但没有任何效果。
帮助!
问题是您不关闭该子文件中的文件,因此内部缓冲的数据会丢失。您可以将文件移动到子节点,或者将所有东西封装在try/finally块中,以确保文件关闭。在父母打开的潜在优势是,您可以在那里处理文件错误。我不是说它是令人信服的,只是一个选择。
def merger(files, name, outfile):
try:
time.sleep(2)
sys.stdout.write("Merging %n...\n" % name)
# the reason for this step is that all the different files have a header
# but I only need the header from the first file.
with open(files[0], 'r') as infile:
for line in infile:
print "writing to outfile: ", name, line
outfile.write(line)
for f in files[1:]:
with open(f, 'r') as infile:
next(infile) # skip first line
for line in infile:
outfile.write(line)
sys.stdout.write("Done with: %s\n" % name)
finally:
outfile.close()
UPDATE
已经有大约父/子文件decriptors会发生什么,在子文件有些混乱。如果程序退出时文件仍处于打开状态,底层C库不会将数据刷新到磁盘。理论是一个正常运行的程序在退出之前关闭事物。下面是一个例子,其中孩子因为没有关闭文件而丢失了数据。
import multiprocessing as mp
import os
import time
if os.path.exists('mytestfile.txt'):
os.remove('mytestfile.txt')
def worker(f, do_close=False):
time.sleep(2)
print('writing')
f.write("this is data")
if do_close:
print("closing")
f.close()
print('without close')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, False))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
print('with close')
os.remove('mytestfile.txt')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, True))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
我在Linux上运行它,我得到
without close
writing
file data:
with close
writing
closing
file data: this is data
这是我在Windows(python 2和3)上得到的结果:http://pastebin.com/kwTAaT5t --tldr:errors。 – Blorgbeard
并不意外。 Windows试图重新打开该文件,但它不能共享。没有错......只是不同而已。 – tdelaney
你叫'out.close()'后立即'p.start()'。我怀疑合并任务是否有时间在文件被关闭之前执行。 – Blorgbeard
@Blorgbeard好点,但仍然没有... – Pavlos
这是一个类似于操作系统的Linux,对吧? – tdelaney