Python的csv DictReader为字段值返回“None”;有任何想法吗?
问题描述:
我是使用Python csv模块解析csv文件时遇到问题的noob编码器。问题是,我的输出表示,除第一个字段外,所有行中的字段值均为“无”。Python的csv DictReader为字段值返回“None”;有任何想法吗?
下面是我试图解析丑陋的CSV文件的第一行(剩余行遵循相同的格式):
0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058
这是我写的解析CSV文件中的代码(它不“T工作的权利):
import csv
#READER SETTINGS
f_path = '/Users/foo'
f_handler = open(f_path, 'rU').read().replace('\n',' ')
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7',
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15',
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23',
'col24','col25']
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel)
#NOW I TRY TO PARSE THE CSV FILE
i = 0
for row in f_reader:
print "my first row was %s" % row
i = i + 1
if i > 0:
break
这里是输出。它说所有的领域除了第一个是空白的,我不知道为什么!任何建议将不胜感激。
my first row was {'col14': None, 'col15': None, 'col16': None,
'col17': None, 'col10': None, 'col11': None, 'col12': None,
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None,
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None,
'col3': None, 'col1': '0', 'col25': None, 'col24': None,
'col21': None, 'col20': None, 'col23': None, 'col22': None}
答
试试这个:
#!/usr/bin/env python
import csv
my_fieldnames = ['col' + str(i) for i in range(1,26)]
with open('input.csv', 'rb') as csvfile:
my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames,
delimiter=',', dialect=csv.excel,
quoting=csv.QUOTE_NONE)
for row in my_reader:
for k,v in row.iteritems():
print k, v
输出作为输入的第一行(记住,字典是无序的):
col14 None
col15 None
col16 None
col17 None
col10 NA
col11 (208)839-2211
col12 "Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
col13 None
col18 None
col19 None
col8 NA
col9 NA
col6 http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268
col7 NA
col4 CAMPGROUND
col5 North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search
col2 213726
col3 NORTH FORK SLATE CREEK
col1 0
col25 None
col24 None
col21 None
col20 None
col23 None
col22 None
答
当你这样做:
f_handler = open(f_path, 'rU').read().replace('\n',' ')
你正在删除所有的换行符s,这是csv.excel方言如何检测新行。由于该文件只有一行,因此只会返回一次。
此外,你正在做的:
if i > 0:
break
你的哪个对于第一次迭代后,循环终止。
为什么它们是空白的,默认值是None(见http://docs.python.org/3.2/library/csv.html),所以键可能不匹配。尽量不要包含字段名称参数,并且您可能会看到您在这个方言中的键是沿着“col2”,“col3”之类的。
一个可爱的小包装使用:
def iter_trim(dict_iter):
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter)
for row in dict_iter:
try:
d = dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()]))
yield d
except:
print "row error:"
print row
用法示例:
def csv_iter(filename):
csv_fp = open(filename)
guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384))
csv_fp.seek(0)
csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect)
return iter_trim(csv_reader)
for row in csv_iter("some-file.csv"):
# do something...
print row
答
事物的宇宙不同的软件系统调用CSV差异很大。幸运的是,Python的优秀CSV模块非常善于处理这些细节,因此您无需手动处理这些事情。
让我强调一下@ metaperture的答案,但没有解释:你可以通过自动检测方言来避免在Python中读取CSV文件的所有猜测。一旦你指定了这个部分,那么就不会有太多的错误发生。
让我给你举一个简单的例子:
import csv
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(10024))
csvfile.seek(0)
qreader = csv.reader(csvfile, dialect)
cnt = 0
for item in qreader:
if cnt >0:
#process your data
else:
#the header of the csv file (field names)
cnt = cnt + 1