Python：读取和写入CSV文件

问题描述：

我试图从CSV文件（A）中读取数据，提取数据并将其写入不同的CSV文件（B）。在新的文件B中，我想要有两列。列1列出文件A和列2中列1的名称以列出文件A中列1的计数。例如，如果文件A看起来像这样没有'：'（它们排成两列）：Python：读取和写入CSV文件

Animal: Gender 
Rabbit: Male 
Dog: Male 
Rabbit: Female 
Cat: Male 
Cat: Male 
Dog: Female 
Dog: Male 
Turtle: Male

我想在文件B输出到这个样子（实际上是没有不同的列“：”再一次）：

Animal: Count 
Cat: 2 
Dog: 3 
Rabbit: 2 
Turtle: 1

这是第一次我做这样的事，并这是我目前为止的内容，但是我没有将数据打印在文件B中，并且“计数”正确完成。有谁能帮我解决这个问题吗？

import csv 
ReadData=csv.reader(open('C:\Users\..\FileA.csv','rb'), delimiter=',') 

def column(ReadData, i): 
    return [row[i] for row in ReadData] 

for line in ReadData: 
    WriteData=csv.writer(open('C:\Users\..\FileB.csv','wb'), 
         delimiter=' ', quotechar=':', quoting=csv.QUOTE_ALL) 
    print column(ReadData,1)

非常感谢您的帮助！

此[链接]（http://*.com/editing-help）介绍了如何编辑/后与标记 – Levon 2012-07-25 22:30:15

谢谢你的快速回复！我一直在检查这个链接，但是我在填充空格时遇到了麻烦......我可能会丢失一些东西... – owl 2012-07-25 22:31:41

代码只是（1）粘贴它，（2）突出显示/选择代码块，然后点击（3 ）Control-K ..它会将它转移到右边（我认为4列），并使其正确显示为代码。 – Levon 2012-07-25 22:32:42

答

我会回答你问题的计数部分，也许你可以把它与你的问题的csv部分结合起来。

l = [ 
    ('Animal','Gender'), 
    ('Rabbit','Male'), 
    ('Dog','Male'), 
    ('Rabbit','Female'), 
    ('Cat','Male'), 
    ('Cat','Male'), 
    ('Dog','Female'), 
    ('Dog','Male'), 
    ('Turtle','Male') 
    ] 

d = {} 
for k,v in l: 
    if not k in d: 
     d[k] = 1 
    else: 
     d[k] += 1 

for k in d: 
    print "%s: %d" % (k,d[k])

我没有筛选标题行，这段代码的输出是：

Turtle: 1 
Cat: 2 
Rabbit: 2 
Animal: 1 
Dog: 3

编辑：

您可以替换此：

if not k in d: 
    d[k] = 1 
else: 
    d[k] += 1

有了这个：

d[k] = d.setdefault(k,0) + 1

您应该使用[defaultdict]（http://docs.python.org/library/collections.html#defaultdict-examples）。 – BrtH 2012-07-25 22:51:48

我会推荐使用'collections.defaultdict（int）' - 失败，至少利用'dict.setdefault' ... – 2012-07-25 22:52:00

@Jon，是的，我更新了帖子以显示setdefault的使用。 – ChipJust 2012-07-25 23:01:02

答

要在Python> = 2.7中进行计数，请参阅this example for collections.Counter。使用collections.defaultdict，请参见here。

在您拨打csv.writer时，quotechar=':'可能是一个错误（这将使得WriteData.writerow(['Hello World', 12345]）发出“：Hello World：12345”，就好像冒号是引号。

另请注意，您的功能column(ReadData, i)消耗ReadData;随后对ReadData的调用可能会返回一个空列表（未测试）。这对你的代码来说不是问题（至少现在不是）。

这是没有的CSV模块的解决方案（毕竟，这些文件不看太像CSV）：

import collections 

inputfile = file("A") 

counts = collections.Counter() 

for line in inputfile: 
    animal = line.split(':')[0] 
    counts[animal] += 1 

for animal, count in counts.iteritems(): 
    print '%s: %s' % (animal, count)

更好地写成'animals =（line.split（'：'）[0] for input in inputfile）; counts = collections.Counter（animals）' – 2012-07-25 22:54:32

@Jon：是的，没错。 – tiwo 2012-07-25 23:00:10

非常感谢所有来源！我会试试看！ – owl 2012-07-25 23:03:49

答

看一看的itertools模块和groupby功能。例如：

from itertools import groupby 

animals = [ 
    ('Rabbit', 'Male'), 
    ('Dog', 'Male'), 
    ('Rabbit', 'Female'), 
    ('Cat', 'Male'), 
    ('Cat', 'Male'), 
    ('Dog', 'Female'), 
    ('Dog', 'Male'), 
    ('Turtle', 'Male') 
    ] 

def get_group_key(animal_data): 
    return animal_data[0] 

animals = sorted(animals, key=get_group_key) 
animal_groups = groupby(animals, get_group_key) 

grouped_animals = [] 
for animal_type in animal_groups: 
    grouped_animals.append((animal_type[0], len(list(animal_type[1])))) 

print grouped_animals 

>>> [('Cat', 2), ('Dog', 3), ('Rabbit', 2), ('Turtle', 1)]

如果一组动物不完全连续 - 这会产生不正确的结果（请参阅上述结果中的“兔子”）。请注意'sum（1 for _ in iterable）'是获取迭代器长度而不实现列表或其他序列的一种方法 – 2012-07-25 22:59:38

感谢您的帮助！我会一一尝试所有的建议。 – owl 2012-07-25 23:03:03

@Jon是的，错过了对数据的排序。关于不实现列表的好处。 – 2012-07-25 23:04:25

答

根据数据和复杂的大小...你可能要考虑使用pandas - 在http://pandas.pydata.org/信息和可用PyPI上。

但是请注意，这可能是过度杀戮，但我认为我会把它扔到混合。

from pandas import DataFrame 

# rows is processed from string in the OP 
rows = [['Rabbit', ' Male'], ['Dog', ' Male'], ['Rabbit', ' Female'], ['Cat', ' Male'], ['Cat', ' Male'], ['Dog', ' Female'], ['Dog', ' Male'], ['Turtle', ' Male']] 

df = pandas.DataFrame(rows, columns=['animal', 'gender']) 

>>> df.groupby('animal').agg(len) 
     gender 
animal   
Cat   2 
Dog   3 
Rabbit  2 
Turtle  1 

>>> df.groupby(['animal', 'gender']).agg(len) 
animal gender 
Cat  Male  2 
Dog  Female 1 
     Male  2 
Rabbit Female 1 
     Male  1 
Turtle Male  1

谢谢分享！你知道是否有办法克服你的代码中的“行”打印组合？实际数据我有数百个有16列的“动物”... – owl 2012-07-25 23:18:54

@owl只需将结果赋值给一个变量...''pandas'基于'numpy'数组，所以如果你熟悉这个数组，已经有能力有效地进行数值计算......学习曲线的位，但值得... – 2012-07-25 23:24:32

谢谢你介绍这个！我正在尝试所有的答案，并没有达到你的，但我会尝试！ – owl 2012-07-25 23:35:23

Python：读取和写入CSV文件

相关推荐