遍历在Python输出一个空白文件
蟒蛇福利局这里具体的CSV行 - 我试图格式化一组真毛CSV的我被送到,这样我可以把它们变成一个很好的Postgres的表进行查询和分析。为了做到这一点,我首先使用csv.writer删除空行和双引号来封装每个条目,洁净他们。这里是我的代码如下所示:遍历在Python输出一个空白文件
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
它完美的罚款,并确切地做什么,我想要它做的。输出csv看起来不错。接下来,我尝试基本上砍掉了一定包含从一开始就和新洁净CSV文件的末尾都完全不必要的垃圾行(省略第8行和最后2)。为此,我真的不能确定,从代码的这部分CSV输出的一个原因(缩进一样“与”块更早)完全是空的:
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
我知道是因为我“与”使用中,关闭()在每个块的末尾是冗余的 - 我尝试了作为一种方法寻找here之后。我也试图把第二个“与”块到不同的文件,并运行在运行第一“与”块之后,但仍无济于事。非常感谢您的帮助!
而且,这里的整个文件:
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
谢谢!
有罪的一方是
row_count = sum(1 for row in reader2)
从reader2
读取所有的数据;现在当您尝试for row in islice(reader2, 7, last_line_index)
时,您不会收到任何数据。
而且,你可能看过很多空白行,因为你打开该文件为二进制;而不是做
with open('file.csv', newline='') as inf:
rd = csv.reader(inf)
您可以快速修复这样的代码(我评论的问题行了,因为@Hugh博思韦尔说,你已经从变量reader2
读取所有数据):
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
我很欣赏它的男人!你的解决方案也是完美的,休刚刚收到我的收件箱,速度更快:) – yungblud
这确实是问题!加快速度!你太快了!..... –
啊哈!我不知道这个阅读是一次性交易!非常感谢您的快速响应! – yungblud