遍历在Python输出一个空白文件

问题描述：

蟒蛇福利局这里具体的CSV行 - 我试图格式化一组真毛CSV的我被送到，这样我可以把它们变成一个很好的Postgres的表进行查询和分析。为了做到这一点，我首先使用csv.writer删除空行和双引号来封装每个条目，洁净他们。这里是我的代码如下所示：遍历在Python输出一个空白文件

import os 
import csv 
import glob 
from itertools import islice 

files = glob.glob('/Users/foo/bar/*.csv') 

# Loop through all of the csv's 
for file in files: 
    # Get the filename from the path 
    outfile = os.path.basename(file) 

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out: 

     reader = csv.reader(inp) 
     writer = csv.writer(out) 
     for row in reader: 
      if row: 
       writer.writerow(row) 
     out.close()

它完美的罚款，并确切地做什么，我想要它做的。输出csv看起来不错。接下来，我尝试基本上砍掉了一定包含从一开始就和新洁净CSV文件的末尾都完全不必要的垃圾行（省略第8行和最后2）。为此，我真的不能确定，从代码的这部分CSV输出的一个原因（缩进一样“与”块更早）完全是空的：

with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2: 
    writer2 = csv.writer(out2) 
    reader2 = csv.reader(inp2) 
    row_count = sum(1 for row in reader2) 
    last_line_index = row_count - 3 
    for row in islice(reader2, 7, last_line_index): 
      writer2.writerow(row) 
    out2.close()

我知道是因为我“与”使用中，关闭（）在每个块的末尾是冗余的 - 我尝试了作为一种方法寻找here之后。我也试图把第二个“与”块到不同的文件，并运行在运行第一“与”块之后，但仍无济于事。非常感谢您的帮助！

而且，这里的整个文件：

import os 
import csv 
import glob 
from itertools import islice 

files = glob.glob('/Users/foo/bar/*.csv') 

# Loop through all of the csv's 
for file in files: 
    # Get the filename from the path 
    outfile = os.path.basename(file) 

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out: 

     reader = csv.reader(inp) 
     writer = csv.writer(out) 
     for row in reader: 
      if row: 
       writer.writerow(row) 
     out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2: 
     writer2 = csv.writer(out2) 
     reader2 = csv.reader(inp2) 
     row_count = sum(1 for row in reader2) 
     last_line_index = row_count - 3 
     for row in islice(reader2, 7, last_line_index): 
       writer2.writerow(row) 
     out2.close()

谢谢！

答

有罪的一方是

row_count = sum(1 for row in reader2)

从reader2读取所有的数据;现在当您尝试for row in islice(reader2, 7, last_line_index)时，您不会收到任何数据。

而且，你可能看过很多空白行，因为你打开该文件为二进制;而不是做

with open('file.csv', newline='') as inf: 
    rd = csv.reader(inf)

这确实是问题！加快速度！你太快了！..... –

啊哈！我不知道这个阅读是一次性交易！非常感谢您的快速响应！ – yungblud

答

您可以快速修复这样的代码（我评论的问题行了，因为@Hugh博思韦尔说，你已经从变量reader2读取所有数据）：

import os 
import csv 
import glob 
from itertools import islice 

files = glob.glob('/Users/foo/bar/*.csv') 

# Loop through all of the csv's 
for file in files: 
    # Get the filename from the path 
    outfile = os.path.basename(file) 

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out: 

     reader = csv.reader(inp) 
     writer = csv.writer(out) 
     for row in reader: 
      if row: 
       writer.writerow(row) 
     out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2: 
      writer2 = csv.writer(out2) 
      reader2 = csv.reader(inp2) 
      row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2 
      last_line_index = row_count - 3 
      for row in islice(reader2, 7, last_line_index): 
        writer2.writerow(row) 
      out2.close()

我很欣赏它的男人！你的解决方案也是完美的，休刚刚收到我的收件箱，速度更快:) – yungblud

遍历在Python输出一个空白文件

相关推荐