熊猫集团的Excel数据按列和图形散点图平均值

问题描述：

我有一个数据集合，我从几个Excel文件中读取数据。我可以轻松地读取，合并和分组数据与熊猫。我在数据中有两列兴趣点，“产品类型”和“测试持续时间”。熊猫集团的Excel数据按列和图形散点图平均值

包含从Excel文件读取的数据的数据帧称为oData。

oDataGroupedByProductType = oData.groupby(['Product Type'])

我已经使用plotly制作图如下，但plotly不保持数据的私有性，如果我想要的数据是私有的我要付钱。支付不是一种选择。如何使用pandas和/或matplotlib制作相同的图形，还可以显示每种产品类型的平均值？

如果你可以添加包在你的程序中使用，然后采取seaborn在那里它被称为stripplot。这是分类的Scartter阴谋。这可能会花费你从示例页面的2行代码来重现你的截图。 – Boud

'.boxplot（）'在熊猫中也适用于这类数据，如果你没有设置这个特定的绘图类型。 –

答

作为Bound说，你可以用stripplot（seaborn文档页面的例子）做几行。

import seaborn as sns 
sns.set_style("whitegrid") 
tips = sns.load_dataset("tips") # load some sample data 
ax = sns.stripplot(x="day", y="total_bill", data=tips)

答

假设你有一些数据帧：

In [4]: df.head(20) 
Out[4]: 
    product  value 
0  c 5.155740 
1  c 8.983128 
2  c 5.150390 
3  a 8.379866 
4  c 8.094536 
5  c 7.464706 
6  b 3.690430 
7  a 5.547448 
8  a 7.709569 
9  c 8.398026 
10  a 7.317957 
11  b 7.821332 
12  b 8.815495 
13  c 6.646533 
14  c 8.239603 
15  c 7.585408 
16  a 7.946760 
17  c 5.276864 
18  c 8.793054 
19  b 11.573413

你需要有一个数值为产品绘制，所以快速和drity，只是通过映射数值做出新的列：

In [5]: product_map = {p:r for p,r in zip(df['product'].unique(), range(1, df.values.shape[0]+1))} 

In [6]: product_map 
Out[6]: {'a': 2, 'b': 3, 'c': 1}

当然，也有很多，你可以做到这一点的方式...

现在，做一个新的列：

In [8]: df['product_code'] = df['product'].map(product_map) 

In [9]: df.head(20) 
Out[9]: 
    product  value product_code 
0  c 5.155740    1 
1  c 8.983128    1 
2  c 5.150390    1 
3  a 8.379866    2 
4  c 8.094536    1 
5  c 7.464706    1 
6  b 3.690430    3 
7  a 5.547448    2 
8  a 7.709569    2 
9  c 8.398026    1 
10  a 7.317957    2 
11  b 7.821332    3 
12  b 8.815495    3 
13  c 6.646533    1 
14  c 8.239603    1 
15  c 7.585408    1 
16  a 7.946760    2 
17  c 5.276864    1 
18  c 8.793054    1 
19  b 11.573413    3

现在，使用pandas的plot辅助方法，这基本上是围绕matplotlib的包装：

In [10]: df.plot(kind='scatter', x = 'product_code', y = 'value') 
Out[10]: <matplotlib.axes._subplots.AxesSubplot at 0x12235abe0>

和输出：

很显然，这是快速和肮脏的，但它应该让你的方式...

答

万一别人有一个非常类似的问题，并希望看到最后的结果，我结束了使用seaborn，如下：

import seaborn as sns 
import matplotlib.pyplot as plt 
... 
sns.set_style("whitegrid") 
sns.boxplot(x=oData['Product Type'], 
      y=oData['Test Duration?'], 
      data=oData) 
plt.savefig('Test Duration vs. Product Type.png')

图表来如如下。出于隐私的原因，我已经模糊了图表上的产品标签。

如果无法显示数据，请提供一些示例数据！ – Lucas

熊猫集团的Excel数据按列和图形散点图平均值

相关推荐