一行一行地读取文件，但是反过来（最后一行先，然后是最后一行等）

问题描述：

我想从文件中删除尾随的空行（如果有的话）。目前我通过在内存中读取它，删除那里的空白行，并覆盖它。该文件很大，但是（30000多行和长行），这需要2-3秒。一行一行地读取文件，但是反过来（最后一行先，然后是最后一行等）

所以我想逐行读取文件，但是向后读，直到我到达第一个非空行。也就是说，我从最后一行开始，然后是最后一行，等等，然后我会截断它，而不是覆盖它。

什么是最好的方式读取它反向？现在我正在考虑读取64k的块，然后以字符为单位循环遍历字符串，直到获得一行，然后当我用完64k，读取另一个64k并预先安装它们，等等。

我假设没有标准函数或库以相反顺序读取？

您预计会有多少空行？成千上万的？每一个可能只是一个单行换行符，所以我认为即使是64k字节也可能会过度杀伤。 – Blckknght 2014-09-22 08:51:46

它可能是，但与将所有内容全部读入内存相比，它仍然是一个非常激烈的优化。 – sashoalm 2014-09-22 08:53:09

有没有内置的功能来做到这一点，但我不得不为此编写一个类。我会看看我能否获得发布权限。 – 2014-09-22 08:59:19

答

这是一些代码，我在别处找到了修改后的版本（这里大概在计算器上，其实...） - 我已经提取的手柄向后读取两个关键方法。

reversed_blocks迭代器以您喜欢的大小块向后读取文件，reversed_lines迭代器将块拆分为行，保存第一个块;如果下一个块以换行符结束，则将其作为完整行返回，如果不是，则将已保存的部分行追加到新块的最后一行，从而完成在块边界上拆分的行。

所有的状态都由Python的迭代器机制来维护，所以我们不必在任何地方存储状态;这也意味着如果需要的话，可以一次向后读取多个文件，因为状态绑定到迭代器。

def reversed_lines(self, file): 
    "Generate the lines of file in reverse order." 
    newline_char_set = set(['\r', '\n']) 
    tail = "" 
    for block in self.reversed_blocks(file): 
     if block is not None and len(block)>0: 
      # First split the whole block into lines and reverse the list 
      reversed_lines = block.splitlines() 
      reversed_lines.reverse() 

      # If the last char of the block is not a newline, then the last line 
      # crosses a block boundary, and the tail (possible partial line from 
      # the previous block) should be added to it. 
      if block[-1] not in newline_char_set: 
       reversed_lines[0] = reversed_lines[0] + tail 

      # Otherwise, the block ended on a line boundary, and the tail is a 
      # complete line itself. 
      elif len(tail)>0: 
       reversed_lines.insert(0,tail) 

      # Within the current block, we can't tell if the first line is complete 
      # or not, so we extract it and save it for the next go-round with a new 
      # block. We yield instead of returning so all the internal state of this 
      # iteration is preserved (how many lines returned, current tail, etc.). 
      tail = reversed_lines.pop() 

      for reversed_line in reversed_lines: 
       yield reversed_line 

    # We're out of blocks now; if there's a tail left over from the last block we read, 
    # it's the very first line in the file. Yield that and we're done. 
    if len(tail)>0: 
     yield tail 

def reversed_blocks(self, file, blocksize=4096): 
    "Generate blocks of file's contents in reverse order." 

    # Jump to the end of the file, and save the file offset. 
    file.seek(0, os.SEEK_END) 
    here = file.tell() 

    # When the file offset reaches zero, we've read the whole file. 
    while 0 < here: 
     # Compute how far back we can step; either there's at least one 
     # full block left, or we've gotten close enough to the start that 
     # we'll read the whole file. 
     delta = min(blocksize, here) 

     # Back up to there and read the block; we yield it so that the 
     # variable containing the file offset is retained. 
     file.seek(here - delta, os.SEEK_SET) 
     yield file.read(delta) 

     # Move the pointer back by the amount we just handed out. If we've 
     # read the last block, "here" will now be zero. 
     here -= delta

reversed_lines是一个迭代器，让你在一个循环中运行它：

for line in self.reversed_lines(fh): 
    do_something_with_the_line(line)

的意见可能是多余的，但在我工作了迭代器如何做他们的工作，他们对我很有用。

答

with open(filename) as f: 
    size = os.stat(filename).st_size 
    f.seek(size - 4096) 
    block = f.read(4096) 
    # Find amount to truncate 
    f.truncate(...)

顺便说一句，你可以使用'f.seek（-4096，2）'。 – sashoalm 2014-09-22 09:02:07

所以你确实知道如何从最后读取文件？或者我误解了你的问题？你可以通过执行'4096 - len（block.rstrip（））'来轻松截取数据。 – filmor 2014-09-22 10:45:46

这给你反向的块，但不是线。查看我在下面的基于迭代器的版本，寻找一个很好的技巧来跟踪块和行偏移量，因此您不必担心自己维护它们。 – 2014-09-22 18:14:36

一行一行地读取文件，但是反过来（最后一行先，然后是最后一行等）

相关推荐