100-Days-Of-ML-Code 第二天 简单线性回归
参考资料来源:
https://github.com/MLEveryday 中文版
https://github.com/Avik-Jain/100-Days-Of-ML-Code/ 英文版
step1:数据预处理
1)导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
2)导入数据 ,并分为特征数据(hours) 和标签数据(分数)
dataset =pd.read_csv('studentscores.csv')
X = dataset.iloc[:,:1].values
Y = dataset.iloc[:,1].values
3)划分数据 分为训练集和测试集 from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size =1/4 , random_state = 0)
step2 通过训练集来训练处简单的线性模型
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor = regressor.fit(X_train,Y_train)
step3 预测结果并打印
Y_pred = regressor.predict(X_test)
print(Y_pred)
[16.84472176 33.74557494 75.50062397 26.7864001 60.58810646 39.71058194
20.8213931 ]
step4 可视化
1)可视化训练集
plt.scatter(X_train,Y_train,color='red')
plt.plot(X_train,regressor.predict(X_train),color = 'blue')
2)可视化测试集
plt.scatter(X_test,Y_test,color = 'red')
plt.plot(X_test,regressor.predict(X_test),'blue')
注意:scatter 画散点图
加载线性模型 from sklearn.linear_model import LinearRegression