蟒蛇csv扭曲告诉

问题描述：

我想通过csv文件阅读时，我正在查找的百分比。我知道如何使用tell（）和文件对象来做到这一点，但是当我使用csv.reader读取该文件对象时，然后对我的reader对象中的行执行for循环，tell（）函数总是返回它在文件的末尾，无论我在循环中的哪个位置。我怎样才能找到我的位置？蟒蛇csv扭曲告诉

当前代码：

with open(FILE_PERSON, 'rb') as csvfile: 
    spamreader = csv.reader(csvfile) 
    justtesting = csvfile.tell() 
    size = os.fstat(csvfile.fileno()).st_size 
    for row in spamreader: 
     pos = csvfile.tell() 
     print pos, "of", size, "|", justtesting

我扔“justtesting”在那里只是为了证明告诉（）不会返回0，直到我开始我的for循环。

这将为我的csv文件中的每一行返回相同的结果： 579 of 579 | 0

我在做什么错？

答

csv库在读取文件时使用了缓冲区，所以文件指针跳转到较大的块中。它不是而是逐行读取你的文件。

它读取较大块中的数据以使解析更容易，并且由于换行符可能嵌入在引号中，因此逐行读取CSV数据将不起作用。

如果你有有给出进度报告，那么你需要预先计算行数。如果输入CSV文件并不在列值中嵌入换行符下才起作用：

with open(FILE_PERSON, 'rb') as csvfile: 
    linecount = sum(1 for _ in csvfile) 
    csvfile.seek(0) 
    spamreader = csv.reader(csvfile) 
    for line, row in enumerate(spamreader): 
     print '{} of {}'.format(line, linecount)

还有其他的方法来计算行（见How to get line count cheaply in Python?）的数量，但因为你会阅读文件反正将其作为CSV进行处理，您也可以使用该文件的打开文件。我不确定是否将文件作为内存映射打开，然后将其作为普通文件再次读取会更好。

好的，这是有道理的，那么有没有什么有效的方法来确定我有多远？我知道我可以做rows = list（spamreader），然后对于rownum，列中的行（行）：但我已经读过，效率很低。还有其他诀窍吗？ – 2013-02-14 16:41:20

@BlairConnolly：为你添加一个。它*将*读取整个文件以首先计算行数。 – 2013-02-14 16:42:08

在调用'enumerate'之前，你不需要强制转换为'list' - 你可以在'enumerate（spamreader）'中进行行处理，这同样有效。 – katrielalex 2013-02-14 16:42:31

答

csvreader文档说：

... csvfile可以是支持迭代器协议及其每个next（）方法被调用时返回字符串的任何对象...

因此一个小改动，OP原代码：

import csv 
import os 
filename = "tar.data" 
with open(filename, 'rb') as csvfile: 
    spamreader = csv.reader(csvfile) 
    justtesting = csvfile.tell() 
    size = os.fstat(csvfile.fileno()).st_size 
    for row in spamreader: 
     pos = csvfile.tell() 
     print pos, "of", size, "|", justtesting 
############################################### 
def generator(csvfile): 
    # readline seems to be the key 
    while True: 
     line = csvfile.readline() 
     if not line: 
      break 
     yield line 
############################################### 
print 
with open(filename, 'rb', 0) as csvfile: 
    spamreader = csv.reader(generator(csvfile)) 
    justtesting = csvfile.tell() 
    size = os.fstat(csvfile.fileno()).st_size 
    for row in spamreader: 
     pos = csvfile.tell() 
     print pos, "of", size, "-", justtesting

运行这个对我的测试数据给出了下面，显示出两个不同的应用程序蟑螂产生不同的结果。

224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 
224 of 224 | 0 

16 of 224 - 0 
32 of 224 - 0 
48 of 224 - 0 
64 of 224 - 0 
80 of 224 - 0 
96 of 224 - 0 
112 of 224 - 0 
128 of 224 - 0 
144 of 224 - 0 
160 of 224 - 0 
176 of 224 - 0 
192 of 224 - 0 
208 of 224 - 0 
224 of 224 - 0

我置零缓冲的open，但它并没有区别，事情是在发电机readline。

相关推荐