Python的csv DictReader为字段值返回“None”;有任何想法吗?

问题描述:

我是使用Python csv模块解析csv文件时遇到问题的noob编码器。问题是,我的输出表示,除第一个字段外,所有行中的字段值均为“无”。Python的csv DictReader为字段值返回“None”;有任何想法吗?

下面是我试图解析丑陋的CSV文件的第一行(剩余行遵循相同的格式):

0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho |  Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br> 

5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058 

这是我写的解析CSV文件中的代码(它不“T工作的权利):

import csv 

#READER SETTINGS 
f_path = '/Users/foo' 
f_handler = open(f_path, 'rU').read().replace('\n',' ') 
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23', 
'col24','col25'] 
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel) 

#NOW I TRY TO PARSE THE CSV FILE 
i = 0 
for row in f_reader: 
    print "my first row was %s" % row 
    i = i + 1 
    if i > 0: 
     break 

这里是输出。它说所有的领域除了第一个是空白的,我不知道为什么!任何建议将不胜感激。

my first row was {'col14': None, 'col15': None, 'col16': None, 
'col17': None, 'col10': None, 'col11': None, 'col12': None, 
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None, 
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None, 
'col3': None, 'col1': '0', 'col25': None, 'col24': None, 
'col21': None, 'col20': None, 'col23': None, 'col22': None} 

试试这个:

#!/usr/bin/env python 

import csv 

my_fieldnames = ['col' + str(i) for i in range(1,26)] 

with open('input.csv', 'rb') as csvfile: 
    my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames, 
           delimiter=',', dialect=csv.excel, 
           quoting=csv.QUOTE_NONE) 

    for row in my_reader: 
     for k,v in row.iteritems(): 
      print k, v 

输出作为输入的第一行(记住,字典是无序的):

col14 None 
col15 None 
col16 None 
col17 None 
col10 NA 
col11 (208)839-2211 
col12 "Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br> 
col13 None 
col18 None 
col19 None 
col8 NA 
col9 NA 
col6 http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268 
col7 NA 
col4 CAMPGROUND 
col5 North Fork Slate Creek Campground | Idaho |  Public Lands Information Center | Recreation Search 
col2 213726 
col3 NORTH FORK SLATE CREEK 
col1 0 
col25 None 
col24 None 
col21 None 
col20 None 
col23 None 
col22 None 

当你这样做:

f_handler = open(f_path, 'rU').read().replace('\n',' ') 

你正在删除所有的换行符s,这是csv.excel方言如何检测新行。由于该文件只有一行,因此只会返回一次。

此外,你正在做的:

if i > 0: 
    break 

你的哪个对于第一次迭代后,循环终止。

为什么它们是空白的,默认值是None(见http://docs.python.org/3.2/library/csv.html),所以键可能不匹配。尽量不要包含字段名称参数,并且您可能会看到您在这个方言中的键是沿着“col2”,“col3”之类的。

一个可爱的小包装使用:

def iter_trim(dict_iter): 
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter) 
for row in dict_iter: 
    try: 
     d = dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) 
     yield d 
    except: 
     print "row error:" 
     print row 

用法示例:

def csv_iter(filename): 
    csv_fp = open(filename) 
    guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384)) 
    csv_fp.seek(0) 
    csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect) 
    return iter_trim(csv_reader) 
for row in csv_iter("some-file.csv"): 
    # do something... 
    print row 

事物的宇宙不同的软件系统调用CSV差异很大。幸运的是,Python的优秀CSV模块非常善于处理这些细节,因此您无需手动处理这些事情。

让我强调一下@ metaperture的答案,但没有解释:你可以通过自动检测方言来避免在Python中读取CSV文件的所有猜测。一旦你指定了这个部分,那么就不会有太多的错误发生。

让我给你举一个简单的例子:

import csv 

    with open(filename, 'rb') as csvfile: 
     dialect = csv.Sniffer().sniff(csvfile.read(10024)) 
     csvfile.seek(0) 
     qreader = csv.reader(csvfile, dialect) 
     cnt = 0 
     for item in qreader: 
      if cnt >0: 
       #process your data 
      else: 
       #the header of the csv file (field names)  
      cnt = cnt + 1