分类-KNN

训练集:用来训练与拟合模型
验证集:当通过训练集训练出多个模型后,使用验证集数据纠偏或比较预测
测试集:模型泛化能力的考量
泛化能力:对未知数据得预测能力
K-fold交叉验证:将数据集分成K份,每份轮流作一遍测试,其他作训练集
from sklearn.model_selection import train_test_split
f_v=features.values
l_v=label.values
X_tt,xv_validation,Y_tt,y_validation=train_test_split(f_v,l_v,train_size=0.2)
xX_train,x_test,Y_train,Y_test=train_test_split(X_tt,Y_tt,test_size=0.25)
print(len(X_train),len(x_validation).,len(X_test)
分类:
KNN
朴素贝叶斯
决策树
支持向量机
继承方法
分类或者回归:
罗吉斯特映射
人工神经网络

KNN
欧氏距离
分类-KNN分类-KNN
#KNN
from sklearn.neighbors import NearestNeighbors.KNeighborsClassifier #离一个点最近的几个点
knn_clf=KNeighborsClassifier(n.neighbors=5)
knn_clf.fit(x_train,y_train)
y_pred=knn_clf.predict(x_validation)
from sklearn.model import accuracy_score,recall_score,fl_score
print(“ACC:”,accuracy_score(Y_prediction,y_pred))
print(“REC:”,recall_score(Y_prediction,y_pred))
print(“F_SCORE:”,fl_score(Y_prediction,y_pred))
#n=5的效果比n=3的差一点
#用测试集比较
y_pred=knn_clf.predict(x_test)
from sklearn.model import accuracy_score,recall_score,fl_score
print(“ACC:”,accuracy_score(Y_test,y_pred))
print(“REC:”,recall_score(Y_test,y_pred))
print(“F_SCORE:”,fl_score(Y_test,y_pred))
#用训练集比较
y_pred=knn_clf.predict(x_train)
from sklearn.model import accuracy_score,recall_score,fl_score
print(“ACC:”,accuracy_score(Y_train,y_pred))
print(“REC:”,recall_score(Y_train,y_pred))
print(“F_SCORE:”,fl_score(Y_train,y_pred))