数据可视化图表-带线性回归最佳拟合线的散点图(Scatter plot with linear regression line of best fit)

两个变量如何相互改变,最佳拟合线就是常用的方法。

下图显示了数据中各组之间最佳拟合线的差异。 要禁用分组并仅为整个数据集绘制一条最佳拟合线,请从下面的 sns.lmplot()调用中删除 hue ='cyl'参数,此时只有一条最佳线。

#初次运行时,总是提示No module named statsmodels.robust.robust_linear_model

然后在pip install statsmodels成功后,运行就ok啦。

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns


# import data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.loc[df.cyl.isin([4,8]),:]

#plot
sns.set_style('white')
gridobj = sns.lmplot(x = 'displ',y = 'hwy',hue="cyl",data = df_select,
                    height = 7,aspect = 1.6,robust = True,palette='tab10',
                    scatter_kws = dict(s = 60,linewidths = .7,edgecolors = 'black'))

# decorations
gridobj.set(xlim = (0.5,7.5),ylim = (0,50))
plt.title("Scatterplot with line of best fit grouped by number of cylinders",fontsize = 20)
plt.show()

数据可视化图表-带线性回归最佳拟合线的散点图(Scatter plot with linear regression line of best fit)

针对每列绘制线性回归线
 可以通过在 sns.lmplot() 中设置 col=groupingcolumn 参数来实现

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.loc[df.cyl.isin([4,8]), :]

# Each line in its own column
sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy", 
                     data=df_select, 
                     height=7, 
                     robust=True, 
                     palette='Set1', 
                     col="cyl",
                     scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))

# Decorations
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
plt.show()

数据可视化图表-带线性回归最佳拟合线的散点图(Scatter plot with linear regression line of best fit)