通过共同的列合并两个CSV文件python
问题描述:
我想合并两个csv文件与一个共同的id列并将合并写入一个新的文件。我曾尝试以下,但它给我一个错误 -通过共同的列合并两个CSV文件python
import csv
from collections import OrderedDict
filenames = "stops.csv", "stops2.csv"
data = OrderedDict()
fieldnames = []
for filename in filenames:
with open(filename, "rb") as fp: # python 2
reader = csv.DictReader(fp)
fieldnames.extend(reader.fieldnames)
for row in reader:
data.setdefault(row["stop_id"], {}).update(row)
fieldnames = list(OrderedDict.fromkeys(fieldnames))
with open("merged.csv", "wb") as fp:
writer = csv.writer(fp)
writer.writerow(fieldnames)
for row in data.itervalues():
writer.writerow([row.get(field, '') for field in fieldnames])
两个文件有“stop_id”一栏,但我发现这个错误回来 - KeyError异常:“stop_id”
任何帮助非常感谢。
由于
答
由于四条的例子。
这是什么为我合并后的每个csv中的第一列合并。
import csv
from collections import OrderedDict
with open('stops.csv', 'rb') as f:
r = csv.reader(f)
dict2 = {row[0]: row[1:] for row in r}
with open('stops2.csv', 'rb') as f:
r = csv.reader(f)
dict1 = OrderedDict((row[0], row[1:]) for row in r)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in d.iteritems():
result.setdefault(key, []).extend(value)
with open('ab_combined.csv', 'wb') as f:
w = csv.writer(f)
for key, value in result.iteritems():
w.writerow([key] + value)
答
下面是使用大熊猫
import sys
from StringIO import StringIO
import pandas as pd
TESTDATA=StringIO("""DOB;First;Last
2016-07-26;John;smith
2016-07-27;Mathew;George
2016-07-28;Aryan;Singh
2016-07-29;Ella;Gayau
""")
list1 = pd.read_csv(TESTDATA, sep=";")
TESTDATA=StringIO("""Date of Birth;Patient First Name;Patient Last Name
2016-07-26;John;smith
2016-07-27;Mathew;XXX
2016-07-28;Aryan;Singh
2016-07-20;Ella;Gayau
""")
list2 = pd.read_csv(TESTDATA, sep=";")
print list2
print list1
common = pd.merge(list1, list2, how='left', left_on=['Last', 'First', 'DOB'], right_on=['Patient Last Name', 'Patient First Name', 'Date of Birth']).dropna()
print common
'data.setdefault(row [“stop_id”],{})。update(row)' - 为什么这么复杂? – Alleo
另外,按列合并两个表是用'pandas.merge'完成的,请参阅http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational - 代数 – Alleo
我用另一个堆栈溢出示例作为输入。你能提出一个替代方案吗?谢谢 – sgpbyrne