TypeError:无法连接'str'和'float'对象:熊猫
问题描述:
我正在尝试使用熊猫来解决数据科学问题。我的数据集包含以下列:“国家”,“转换”,“测试”,“用户ID”等。在国家专栏中,大约有10个国家。 “测试”列的值为0和1表示两种类型的测试:控制0和实验1.“转换”也具有值0和1,表示该人是否已转换。TypeError:无法连接'str'和'float'对象:熊猫
我想groupby国家和计算每个组的测试== 0和测试== 1的p值和均值。我试图使用下面的函数,但是它会抛出一个错误,“TypeError:无法连接'str'和'float'对象”。有人可以澄清这一点吗?
def f(x):
control = x.loc[(x.test==0)]
test = x.loc[(x.test==1)]
p_value = stats.ttest_ind(control,test)[0]
control_mean = control['conversion'].mean()
test_mean = test['conversion'].mean()
return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
bycountry = data1.groupby('country').apply(f)
bycountry = bycountry.reset_index(level='None')
bycountry
完整的错误消息:df.dtypes的
TypeError Traceback (most recent call last)
<ipython-input-495-bd6227878520> in <module>()
7 return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
8
----> 9 bycountry = data1.groupby("country").apply(f)
10 bycountry = bycountry.reset_index(level='None')
11 bycountry
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
649 # ignore SettingWithCopy here in case the user mutates
650 with option_context('mode.chained_assignment', None):
--> 651 return self._python_apply_general(f)
652
653 def _python_apply_general(self, f):
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _python_apply_general(self, f)
653 def _python_apply_general(self, f):
654 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 655 self.axis)
656
657 return self._wrap_applied_output(
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, f, data, axis)
1525 # group might be modified
1526 group_axes = _get_axes(group)
-> 1527 res = f(group)
1528 if not _is_indexed_like(res, group_axes):
1529 mutated = True
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(g)
645 @wraps(func)
646 def f(g):
--> 647 return func(g, *args, **kwargs)
648
649 # ignore SettingWithCopy here in case the user mutates
<ipython-input-495-bd6227878520> in f(x)
2 control = x.loc[(x.test==0)]
3 test = x.loc[(x.test==1)]
----> 4 p_value = stats.ttest_ind(control,test)[0]
5 control_mean = control['conversion'].mean()
6 test_mean = test['conversion'].mean()
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\scipy\stats\stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy)
3865 return Ttest_indResult(np.nan, np.nan)
3866
-> 3867 v1 = np.var(a, axis, ddof=1)
3868 v2 = np.var(b, axis, ddof=1)
3869 n1 = a.shape[axis]
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
3098
3099 return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3100 keepdims=keepdims)
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
89 # Note that if dtype is not of inexact type then arraymean will
90 # not be either.
---> 91 arrmean = umr_sum(arr, axis, dtype, keepdims=True)
92 if isinstance(arrmean, mu.ndarray):
93 arrmean = um.true_divide(
TypeError: cannot concatenate 'str' and 'float' objects
输出:
user_id int64
date datetime64[ns]
source object
device object
browser_language object
ads_channel object
browser object
conversion int64
test int64
sex object
age float64
country object
dtype: object
答
def f(x):
control = x.loc[(x.test==0)]
control = control['conversion']
test = x.loc[(x.test==1)]
test = test['conversion']
p_value = stats.ttest_ind(control,test)[0]
control_mean = control.mean()
test_mean = test.mean()
return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
这并获得成功!再次感谢@ juanpa.arrivillaga!
发布完整的堆栈跟踪 –
我怀疑发生了什么是你有一个'obj'类型的列,其中混合了'float'和'string'值。 –
@ juanpa.arrivillaga:我发布了完整的错误消息。 – Gingerbread