无法追加与大熊猫0.17.1 dataframes但可以与大熊猫0.14.1

问题描述:

我有两个dataframes,ch下面无法追加与大熊猫0.17.1 dataframes但可以与大熊猫0.14.1

c pickle file: http://s000.tinyupload.com/?file_id=64255815375060941529 
h pickle file: http://s000.tinyupload.com/?file_id=98284988001290720556 

当我写c.append(h)我得到TypeError: data type not understood,但只有当我运行熊猫0.17.1 。如果我在pandas 0.14.1中运行此代码,那么数据框会被正确添加。发生了什么,以及如何修改我的数据框以在0.17.1中正确添加?

编辑:这里是为dataframes

In [49]: h.head(3) 
Out[49]: 
    report_id adv_firm_key manager_id   filing_manager_name \ 
0  45497  105129  20984 Bridgewater Associates, LP 
1  45497  105129  20984 Bridgewater Associates, LP 
2  45497  105129  20984 Bridgewater Associates, LP 

    report_period   issuer_name  cusip position_value quantity \ 
0 2015-12-31   ABBOTT LABS 002824100   1745000  38857 
1 2015-12-31 ACCENTURE PLC IRELAND G1151C101   512000  4900 
2 2015-12-31   ADOBE SYS INC 00724F101   9157000  97479 

    principal_type put_or_call     sector total_holding_value \ 
0    SH   X    Health Care   7707722000 
1    SH   X Information Technology   7707722000 
2    SH   X Information Technology   7707722000 

    total_holding_value_calculated market_cap shares_float  beta symbol \ 
0      7707722000 66993140300 1488070000 0.924138 ABT 
1      7707722000 67773564900  626355000 0.985543 ACN 
2      7707722000 46848347700  496787000 1.099186 ADBE 

    allocation portfolio_value 
0  300000   2000000 
1  300000   2000000 
2  300000   2000000 

In [50]: c.head(3) 
Out[50]: 
    put_or_call position_value report_date fund_id report_period \ 
0   X   10000 2015-11-02  502 2015-12-31 
1   X   10000 2015-11-02  502 2015-12-31 
2   X   10000 2015-11-02  502 2015-12-31 

    underlying_id quantity side      created_at report_id \ 
0   1001   5 Short 2016-03-16 17:31:57.003792+00:00  NaN 
1   1001   5 Short 2016-03-16 17:31:57.003792+00:00  NaN 
2   1001   5 Short 2016-03-16 17:31:57.003792+00:00  NaN 

    ...  adv_firm_key      filing_manager_name symbol \ 
0 ...   155680 Davidson Kempner Capital Management LP AAOI 
1 ...   155680 Davidson Kempner Capital Management LP AAOI 
2 ...   155680 Davidson Kempner Capital Management LP AAOI 

         sector  cusip      issuer_name \ 
0 Telecommunication Services 03823U102  APPLIED OPTOELECTRONICS INC 
1 Telecommunication Services 03823U102 APPLIED OPTOELECTRONICSINC COM 
2 Telecommunication Services 03823U102  APPLIED OPTOELECTRONICS INC 

    principal_type market_cap shares_float  beta 
0    SH 288734200  14566500 1.45758 
1    SH 288734200  14566500 1.45758 
2    SH 288734200  14566500 1.45758 

[3 rows x 21 columns] 

编辑2头:这是一个堆栈跟踪

In [11]: pd.concat([c,h]) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-11-943f474750e7> in <module>() 
----> 1 pd.concat([c,h]) 

/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy) 
    833      verify_integrity=verify_integrity, 
    834      copy=copy) 
--> 835  return op.get_result() 
    836 
    837 

/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/tools/merge.py in get_result(self) 
    1023    new_data = concatenate_block_managers(
    1024     mgrs_indexers, self.new_axes, 
-> 1025     concat_axis=self.axis, copy=self.copy) 
    1026    if not self.copy: 
    1027     new_data._consolidate_inplace() 

/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy) 
    4472             copy=copy), 
    4473       placement=placement) 
-> 4474    for placement, join_units in concat_plan] 
    4475 
    4476  return BlockManager(blocks, axes) 

/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy) 
    4569  to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype, 
    4570           upcasted_na=upcasted_na) 
-> 4571     for ju in join_units] 
    4572 
    4573  if len(to_concat) == 1: 

/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na) 
    4823    if self.is_null and not getattr(self.block, 'is_categorical', 
    4824            None): 
-> 4825     missing_arr = np.empty(self.shape, dtype=empty_dtype) 
    4826     if np.prod(self.shape): 
    4827      # NumPy 1.6 workaround: this statement gets strange if all 

TypeError: data type not understood 
+0

你尝试版本最新版本'0.18.0'? – jezrael

+0

在每个数据框中包含导致问题中的问题的一小部分数据。我不会打两个随机URL来为你解决这个问题。 –

+0

@jezrael这也发生在'0.18.0'和'concat'或'append' –

存在BUG 11351 - 无法正常处理:

如果您尝试添加新列created_at,这在h和中缺少:

h['created_at'] = np.nan 
new = pd.concat([h,c]) 

得到错误:

AttributeError: 'numpy.ndarray' object has no attribute 'tz_localize'

一种解决方案是转换Datetimestring

c['created_at'] = c['created_at'].astype(str) 
new = pd.concat([h,c]) 
new['created_at'] = pd.to_datetime(new['created_at'])