Python的大熊猫:查询失败KeyError异常的一列了很多

问题描述:

试图查询数据帧Python的大熊猫:查询失败KeyError异常的一列了很多

In [6]: books.dtypes 
Out[6]: 
count    float64 
product    int64 
channel    int64 
book_start_year  int64 
book_start_week  int64 
book_end_year  int64 
book_end_week  int64 
period   float64 
dtype: object 

In [8]: print(books.columns.tolist()) 
['count', 'product', 'channel', 'book_start_year', 'book_start_week', 'book_end_year', 'book_end_week', 'period'] 

有:

books[books.channel == 1] 

工作正常,但是这一个:

books[books.product == 1] 

失败KeyError(请参阅下文)。数据帧根据前面大熊猫书面只需一分钟前在MacOS下使用命令csv文件阅读:

books = pd.read_csv('boxes2.csv', header=0)  

复位或设置索引的另一列也没有帮助。有任何想法吗?

更新

我怎么那么应该写这样的查询:

data = books[(books.start_year >= start_year) 
       & (books.start_week >= start_week) 
       & (books.end_year <= end_year) 
       & (books.end_week <= end_week) 
       & (books.product == product) 
       ] 

或者我可以不?

错误:

In [5]: books[books.product == 1] 
    --------------------------------------------------------------------------- 
    KeyError         Traceback (most recent call last) 
    <ipython-input-5-c6883f7202ed> in <module>() 
    ----> 1 books[books.product == 1] 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key) 
     2002   # get column 
     2003   if self.columns.is_unique: 
    -> 2004    return self._get_item_cache(key) 
     2005 
     2006   # duplicate columns & possible reduce dimensionality 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item\ 
) 
     1348   res = cache.get(item) 
     1349   if res is None: 
    -> 1350    values = self._data.get(item) 
     1351    res = self._book_item_values(item, values) 
     1352    cache[item] = res 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath\ 
) 
     3288 
     3289    if not isnull(item): 
    -> 3290     loc = self.items.get_loc(item) 
     3291    else: 
     3292     indexer = np.arange(len(self.items))[isnull(self.items)] 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method,\ 
tolerance) 
     1945     return self._engine.get_loc(key) 
     1946    except KeyError: 
    -> 1947     return self._engine.get_loc(self._maybe_cast_indexer(key)) 
     1948 
     1949   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)() 

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)() 

    pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)() 

    pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)() 

    KeyError: False 

product是你需要使用引号来访问你的专栏,因为方法抬头第一列名之前,访问列作为属性是一种便捷的方法,但它容易出错,所以你应该用方括号:

books[books['product'] == 1] 

每个人都应该想到dataframes为一体的Series一个dict太像一个正常的字典,你可以通过一个Key返回一个Value在这种情况下将是列或系列。

注意,IPython中显示product如下:

Signature: df.product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) 
Docstring: 
Return the product of the values for the requested axis 

Parameters 
---------- 
axis : {index (0), columns (1)} 
skipna : boolean, default True 
    Exclude NA/null values. If an entire row/column is NA, the result 
    will be NA 
level : int or level name, default None 
    If the axis is a MultiIndex (hierarchical), count along a 
    particular level, collapsing into a Series 
numeric_only : boolean, default None 
    Include only float, int, boolean columns. If None, will attempt to use 
    everything, then use only numeric data. Not implemented for Series. 

Returns 
------- 
prod : Series or DataFrame (if level specified) 
File:  c:\winpython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py 
Type:  method 

所以这不是证明,但它一样prod

这也强烈建议你停止访问列的属性,因为它导致奇怪的错误,养成使用[]来访问列的习惯,以避免将来出现这种情况

编辑

回答您的更新问题,使用[]访问所有列:

data = books[(books['start_year'] >= start_year) 
       & (books['start_week'] >= start_week) 
       & (books['end_year'] <= end_year) 
       & (books['end_week'] <= end_week) 
       & (books['product'] == product) 
       ] 

尽管从技术上说,你只需要为产品列做到这一点,你应该养成这样所有列

的习惯
+0

请看我更新的问题 – zork

+1

查看更新的答案 – EdChum

+1

这样的答案值得高调投票 – piRSquared