与CSV按城市在Python
分类处理我已经整理CSV文件与天气预报数据:与CSV按城市在Python
New York
"2016-04-08T07:00Z 6.2 d300 1 0.0 220 10.2 79 331"
"2016-04-08T08:00Z 7.1 d000 1 0.0 223 10.6 74 400"
"2016-04-08T09:00Z 7.7 d000 1 0.0 225 10.9 68 448"
"2016-04-08T10:00Z 8.4 d000 2 0.0 225 10.9 64 553"
"2016-04-08T11:00Z 8.9 d100 5 0.0 226 11.0 59 550"
"2016-04-08T12:00Z 9.1 d100 8 0.0 227 11.0 57 516"
"2016-04-08T13:00Z 8.6 d100 1 0.0 227 10.6 61 447"
"2016-04-08T14:00Z 8.1 d100 4 0.0 227 10.1 64 362"
Boston
"2016-04-08T07:00Z 6.2 d300 1 0.0 220 10.2 79 331"
"2016-04-08T08:00Z 7.1 d000 1 0.0 223 10.6 74 400"
"2016-04-08T09:00Z 7.7 d000 1 0.0 225 10.9 68 448"
"2016-04-08T10:00Z 8.4 d000 2 0.0 225 10.9 64 553"
"2016-04-08T11:00Z 8.9 d100 5 0.0 226 11.0 59 550"
"2016-04-08T12:00Z 9.1 d100 8 0.0 227 11.0 57 516"
"2016-04-08T13:00Z 8.6 d100 1 0.0 227 10.6 61 447"
"2016-04-08T14:00Z 8.1 d100 4 0.0 227 10.1 64 362"
等......每个城市有8个气象数据条目。
如何在Python中处理这种类型的CSV?
我想自动将整个CSV映射到具有诸如Place,DateTime,Temperature,Attr4,Attr5等属性的类实例数组......或者可能是某些其他数据结构 - 字典?这个简单的代码
with open('test.csv', 'rb') as csvfile:
wreader = csv.reader(csvfile, delimiter='\t', quotechar='"')
for row in wreader:
print row
输出是
['New York']
['2016-04-08T07:00Z\t6.2\td300\t1\t0.0\t220\t10.2\t79\t331']
['2016-04-08T08:00Z\t7.1\td000\t1\t0.0\t223\t10.6\t74\t400']
['2016-04-08T09:00Z\t7.7\td000\t1\t0.0\t225\t10.9\t68\t448']
['2016-04-08T10:00Z\t8.4\td000\t2\t0.0\t225\t10.9\t64\t553']
['2016-04-08T11:00Z\t8.9\td100\t5\t0.0\t226\t11.0\t59\t550']
['2016-04-08T12:00Z\t9.1\td100\t8\t0.0\t227\t11.0\t57\t516']
['2016-04-08T13:00Z\t8.6\td100\t1\t0.0\t227\t10.6\t61\t447']
['2016-04-08T14:00Z\t8.1\td100\t4\t0.0\t227\t10.1\t64\t362']
正如你看到在双引号中的内容不会被解析
然后,我改变quotechar=' '
和部分解决问题
['"2016-04-08T07:00Z', '6.2', 'd300', '1', '0.0', '220', '10.2', '79', '331"']
但仍然留下双引号。 如何移除?
有条件读取csv数据,获取城市名称并将项目追加到本地列表中。然后,将该列表用于其他扩展需求,例如定义类和字典。
import csv
weatherdata = []
with open('WeatherData.csv'), 'r') as csvfile:
readCSV = csv.reader(csvfile)
for line in readCSV:
items = [i.replace('"', '').split() for i in line][0]
if len(items) < 3:
city = items
else:
weatherdata.append([' '.join(city)] + items)
for i in weatherdata:
print(i)
# ['New York', '2016-04-08T07:00Z', '6.2', 'd300', '1', '0.0', '220', '10.2', '79', '331']
# ['New York', '2016-04-08T08:00Z', '7.1', 'd000', '1', '0.0', '223', '10.6', '74', '400']
# ['New York', '2016-04-08T09:00Z', '7.7', 'd000', '1', '0.0', '225', '10.9', '68', '448']
# ['New York', '2016-04-08T10:00Z', '8.4', 'd000', '2', '0.0', '225', '10.9', '64', '553']
# ['New York', '2016-04-08T11:00Z', '8.9', 'd100', '5', '0.0', '226', '11.0', '59', '550']
# ['New York', '2016-04-08T12:00Z', '9.1', 'd100', '8', '0.0', '227', '11.0', '57', '516']
# ['New York', '2016-04-08T13:00Z', '8.6', 'd100', '1', '0.0', '227', '10.6', '61', '447']
# ['New York', '2016-04-08T14:00Z', '8.1', 'd100', '4', '0.0', '227', '10.1', '64', '362']
# ['Boston', '2016-04-08T07:00Z', '6.2', 'd300', '1', '0.0', '220', '10.2', '79', '331']
# ['Boston', '2016-04-08T08:00Z', '7.1', 'd000', '1', '0.0', '223', '10.6', '74', '400']
# ['Boston', '2016-04-08T09:00Z', '7.7', 'd000', '1', '0.0', '225', '10.9', '68', '448']
# ['Boston', '2016-04-08T10:00Z', '8.4', 'd000', '2', '0.0', '225', '10.9', '64', '553']
# ['Boston', '2016-04-08T11:00Z', '8.9', 'd100', '5', '0.0', '226', '11.0', '59', '550']
# ['Boston', '2016-04-08T12:00Z', '9.1', 'd100', '8', '0.0', '227', '11.0', '57', '516']
# ['Boston', '2016-04-08T13:00Z', '8.6', 'd100', '1', '0.0', '227', '10.6', '61', '447']
# ['Boston', '2016-04-08T14:00Z', '8.1', 'd100', '4', '0.0', '227', '10.1', '64', '362']
line [0]数据是['2016-04-08T07:00Z \ t6.2 \ td300 \ t1 \ t0.0 \ t220 \ t10.2 \ t79 \ t331']所以line [1]超出范围 – leonas5555
查看使用列表理解和列表拆分处理报价结构的更新代码(甚至处理城市中的空间,如* New York *)。 – Parfait
所以,如果我得到了正确的报价=''解决了问题。然后结果是
['"2016-04-08T07:00Z', '6.2', 'd300', '1', '0.0', '220', '10.2', '79', '331"']
['"2016-04-08T08:00Z', '7.1', 'd000', '1', '0.0', '223', '10.6', '74', '400"']
['"2016-04-08T09:00Z', '7.7', 'd000', '1', '0.0', '225', '10.9', '68', '448"']
['"2016-04-08T10:00Z', '8.4', 'd000', '2', '0.0', '225', '10.9', '64', '553"']
['"2016-04-08T11:00Z', '8.9', 'd100', '5', '0.0', '226', '11.0', '59', '550"']
['"2016-04-08T12:00Z', '9.1', 'd100', '8', '0.0', '227', '11.0', '57', '516"']
['"2016-04-08T13:00Z', '8.6', 'd100', '1', '0.0', '227', '10.6', '61', '447"']
,并在总
weatherdata = []
with open('test.csv', 'r') as csvfile:
readCSV = csv.reader(csvfile, delimiter='\t', quotechar=' ')
for line in readCSV:
if len(line) == 1:
city = line[0]
else:
weatherdata.append([city] + line)
for c in weatherdata:
print(c)
Nonono ....双引号仍有问题['“2016-04-08T07:00Z'..................'79','331'' – leonas5555
你应该在读了(易谷歌能)[CSV](https://docs.python.org/2/library/csv .html)模块! – schwobaseggl
你有试过什么吗?粘贴你已经实现的内容,然后有人会帮助 – haifzhan