Python的大熊猫:查询失败KeyError异常的一列了很多
问题描述:
试图查询数据帧Python的大熊猫:查询失败KeyError异常的一列了很多
In [6]: books.dtypes
Out[6]:
count float64
product int64
channel int64
book_start_year int64
book_start_week int64
book_end_year int64
book_end_week int64
period float64
dtype: object
In [8]: print(books.columns.tolist())
['count', 'product', 'channel', 'book_start_year', 'book_start_week', 'book_end_year', 'book_end_week', 'period']
有:
books[books.channel == 1]
工作正常,但是这一个:
books[books.product == 1]
失败KeyError(请参阅下文)。数据帧根据前面大熊猫书面只需一分钟前在MacOS下使用命令csv文件阅读:
books = pd.read_csv('boxes2.csv', header=0)
复位或设置索引的另一列也没有帮助。有任何想法吗?
更新
我怎么那么应该写这样的查询:
data = books[(books.start_year >= start_year)
& (books.start_week >= start_week)
& (books.end_year <= end_year)
& (books.end_week <= end_week)
& (books.product == product)
]
或者我可以不?
错误:
In [5]: books[books.product == 1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-5-c6883f7202ed> in <module>()
----> 1 books[books.product == 1]
/Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
/Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item\
)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._book_item_values(item, values)
1352 cache[item] = res
/Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath\
)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
/Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method,\
tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
KeyError: False
答
product
是你需要使用引号来访问你的专栏,因为方法抬头第一列名之前,访问列作为属性是一种便捷的方法,但它容易出错,所以你应该用方括号:
books[books['product'] == 1]
每个人都应该想到dataframes为一体的Series
一个dict
太像一个正常的字典,你可以通过一个Key
返回一个Value
在这种情况下将是列或系列。
注意,IPython中显示product
如下:
Signature: df.product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Docstring:
Return the product of the values for the requested axis
Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
prod : Series or DataFrame (if level specified)
File: c:\winpython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py
Type: method
所以这不是证明,但它一样prod
这也强烈建议你停止访问列的属性,因为它导致奇怪的错误,养成使用[]
来访问列的习惯,以避免将来出现这种情况
编辑
回答您的更新问题,使用[]
访问所有列:
data = books[(books['start_year'] >= start_year)
& (books['start_week'] >= start_week)
& (books['end_year'] <= end_year)
& (books['end_week'] <= end_week)
& (books['product'] == product)
]
尽管从技术上说,你只需要为产品列做到这一点,你应该养成这样所有列
的习惯
请看我更新的问题 – zork
查看更新的答案 – EdChum
这样的答案值得高调投票 – piRSquared