TypeError：无法连接'str'和'float'对象：熊猫

问题描述：

我正在尝试使用熊猫来解决数据科学问题。我的数据集包含以下列：“国家”，“转换”，“测试”，“用户ID”等。在国家专栏中，大约有10个国家。 “测试”列的值为0和1表示两种类型的测试：控制0和实验1.“转换”也具有值0和1，表示该人是否已转换。TypeError：无法连接'str'和'float'对象：熊猫

我想groupby国家和计算每个组的测试== 0和测试== 1的p值和均值。我试图使用下面的函数，但是它会抛出一个错误，“TypeError：无法连接'str'和'float'对象”。有人可以澄清这一点吗？

def f(x): 
     control = x.loc[(x.test==0)] 
     test = x.loc[(x.test==1)] 
     p_value = stats.ttest_ind(control,test)[0] 
     control_mean = control['conversion'].mean() 
     test_mean = test['conversion'].mean() 
     return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})  

bycountry = data1.groupby('country').apply(f) 
bycountry = bycountry.reset_index(level='None') 
bycountry

完整的错误消息：df.dtypes的

TypeError         Traceback (most recent call last) 
<ipython-input-495-bd6227878520> in <module>() 
     7  return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean}) 
     8 
----> 9 bycountry = data1.groupby("country").apply(f) 
    10 bycountry = bycountry.reset_index(level='None') 
    11 bycountry 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs) 
    649   # ignore SettingWithCopy here in case the user mutates 
    650   with option_context('mode.chained_assignment', None): 
--> 651    return self._python_apply_general(f) 
    652 
    653  def _python_apply_general(self, f): 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _python_apply_general(self, f) 
    653  def _python_apply_general(self, f): 
    654   keys, values, mutated = self.grouper.apply(f, self._selected_obj, 
--> 655             self.axis) 
    656 
    657   return self._wrap_applied_output(

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, f, data, axis) 
    1525    # group might be modified 
    1526    group_axes = _get_axes(group) 
-> 1527    res = f(group) 
    1528    if not _is_indexed_like(res, group_axes): 
    1529     mutated = True 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(g) 
    645   @wraps(func) 
    646   def f(g): 
--> 647    return func(g, *args, **kwargs) 
    648 
    649   # ignore SettingWithCopy here in case the user mutates 

<ipython-input-495-bd6227878520> in f(x) 
     2  control = x.loc[(x.test==0)] 
     3  test = x.loc[(x.test==1)] 
----> 4  p_value = stats.ttest_ind(control,test)[0] 
     5  control_mean = control['conversion'].mean() 
     6  test_mean = test['conversion'].mean() 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\scipy\stats\stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy) 
    3865   return Ttest_indResult(np.nan, np.nan) 
    3866 
-> 3867  v1 = np.var(a, axis, ddof=1) 
    3868  v2 = np.var(b, axis, ddof=1) 
    3869  n1 = a.shape[axis] 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims) 
    3098 
    3099  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, 
-> 3100       keepdims=keepdims) 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims) 
    89  # Note that if dtype is not of inexact type then arraymean will 
    90  # not be either. 
---> 91  arrmean = umr_sum(arr, axis, dtype, keepdims=True) 
    92  if isinstance(arrmean, mu.ndarray): 
    93   arrmean = um.true_divide(

TypeError: cannot concatenate 'str' and 'float' objects

输出：

user_id      int64 
date    datetime64[ns] 
source      object 
device      object 
browser_language   object 
ads_channel     object 
browser      object 
conversion     int64 
test       int64 
sex       object 
age      float64 
country      object 
dtype: object

发布完整的堆栈跟踪 –

我怀疑发生了什么是你有一个'obj'类型的列，其中混合了'float'和'string'值。 –

@ juanpa.arrivillaga：我发布了完整的错误消息。 – Gingerbread

答

def f(x): 
    control = x.loc[(x.test==0)] 
    control = control['conversion'] 
    test = x.loc[(x.test==1)] 
    test = test['conversion'] 
    p_value = stats.ttest_ind(control,test)[0] 
    control_mean = control.mean() 
    test_mean = test.mean() 
    return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})

这并获得成功！再次感谢@ juanpa.arrivillaga！

TypeError：无法连接'str'和'float'对象：熊猫

相关推荐