存储电子表格的列在一个Python字典

问题描述：

 
Species  Garden Hedgerow Parkland Pasture Woodland 
Blackbird  47  10  40  2  2 
Chaffinch  19  3  5  0  2 
Great Tit  50  0  10  7  0 
House Sparrow 46  16  8  4  0 
Robin   9  3  0  0  2 
Song Thrush  4  0  6  0  0

我使用xlrd Python库读取这些数据。我没有问题，将其读入列表的列表（存储作为列表表的每一行），使用下面的代码：

from xlrd import open_workbook 
wb = open_workbook("Sample.xls") 
headers = [] 
sdata = [] 
for s in wb.sheets(): 
    print "Sheet:",s.name 
    if s.name.capitalize() == "Data": 
     for row in range(s.nrows): 
      values = [] 
      for col in range(s.ncols): 
       data = s.cell(row,col).value 
       if row == 0: 
        headers.append(data) 
       else: 
        values.append(data) 
      sdata.append(values)

由于可能是明显的，headers是一个简单的列表存储在列标题和sdata包含表格数据，作为列表列表存储。以下是他们看：

标题：

[u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']

SDATA：

[[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]]

但我希望将这些数据存储到一个Python字典，每列作为包含列表中的关键每列的所有值。例如（仅部分数据显示，以节省空间）：

dict = { 
    'Species': ['Blackbird','Chaffinch','Great Tit'], 
    'Garden': [47,19,50], 
    'Hedgerow': [10,3,0], 
    'Parkland': [40,5,10], 
    'Pasture': [2,0,7], 
    'Woodland': [2,2,0] 
}

所以，我的问题是：我怎么能做到这一点？我知道我可以通过列而不是像上面的代码片段那样通过行读取数据，但我无法弄清楚如何将字段存储在字典中。

在此先感谢您提供的任何帮助。

顺便说，大熊猫做这一切一气呵成，产生一个数据帧的对象，可以使用非常喜欢你的字典。 – mdurant 2014-10-09 23:07:47

我建议你发布你现在拥有的清单清单。会给人们一个简单的方法来测试他们的答案：在这个例子中，把你的想法变成你想要的。 – 2014-10-10 12:41:43

谢谢Emilio的建议，我会提供。 – maurobio 2014-10-10 13:02:52

答

一旦你的栏目，这也很容易：

dict(zip(headers, sdata))

事实上，它看起来像你的榜样sdata可能是该行的数据，即便如此，这仍然是相当容易的，您可以移调表zip以及：

dict(zip(headers, zip(*sdata)))

其中一个是你要求的。

答

1。 XLRD

我强烈推荐使用collections库中的defaultdict。每个键的值将以默认值启动，在这种情况下为空列表。我没有把那么多的异常抓到那里，你可能想根据你的用例添加异常检测。

import xlrd 
import sys 
from collections import defaultdict 
result = defaultdict(list) 
workbook = xlrd.open_workbook("/Users/datafireball/Desktop/*.xlsx") 
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0]) 

headers = worksheet.row(0) 
for index in range(worksheet.nrows)[1:]: 
    try: 
     for header, col in zip(headers, worksheet.row(index)): 
      result[header.value].append(col.value) 
    except: 
     print sys.exc_info() 

print result

输出：

defaultdict(<type 'list'>, 
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], 
u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], 
u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], 
u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], 
u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], 
u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']})

2。大熊猫

import pandas as pd 
xl = pd.ExcelFile("/Users/datafireball/Desktop/*.xlsx") 
df = xl.parse(xl.sheet_names[0]) 
print df

输出，你无法想象使用数据框可以获得多大的灵活性。

   Species Garden Hedgerow Parkland Pasture Woodland 
0  Blackbird  47  10  40  2   2 
1  Chaffinch  19   3   5  0   2 
2  Great Tit  50   0  10  7   0 
3 House Sparrow  46  16   8  4   0 
4   Robin  9   3   0  0   2 
5 Song Thrush  4   0   6  0   0

谢谢，的确我知道这可以通过熊猫来实现，但出于以下几个原因，我正在寻找更直接的解决方案（就像您和其他人提供的那样！）。 – maurobio 2014-10-10 12:28:30

答

如果XLRD无法解决您的问题，请考虑查看XLWings。其中一个示例视频演示了如何从Excel表中获取数据并将其导入Pandas数据框，这比字典更有用。

如果你真的想要一本字典，熊猫可以很容易地转换为，看here。

答

我会贡献自己，为我自己的问题提供另一个答案！

刚发布我的问题后，我找到pyexcel - 一个非常小的Python库，它充当其他电子表格处理软件包（即xlrd和odfpy）的包装。它有一个很好的to_dict方法，它完全符合我的需求（即使不需要转换表格）！

这里是个例，使用上面的数据：

from pyexcel import SeriesReader 
from pyexcel.utils import to_dict 

sheet = SeriesReader("Sample.xls") 
print sheet.series() #--- just the headers, stored in a list 
data = to_dict(sheet) 
print data #--- the full dataset, stored in a dictionary

输出：

u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland'] 
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}

希望它可以帮助！

答

这个脚本允许你在Excel中的数据转换到列表的dictionnary

import xlrd 

workbook = xlrd.open_workbook('Sample.xls') 
workbook = xlrd.open_workbook('Sample.xls', on_demand = True) 
worksheet = workbook.sheet_by_index(0) 
first_row = [] # The row where we stock names of columns 
for col in range(worksheet.ncols): 
    first_row.append(worksheet.cell_value(0,col)) 
# tronsform the workbook to a list of dictionnary 
data =[] 
for row in range(1, worksheet.nrows): 
    elm = {} 
    for col in range(worksheet.ncols): 
     elm[first_row[col]]=worksheet.cell_value(row,col) 
    data.append(elm) 
print data

存储电子表格的列在一个Python字典

相关推荐