cs231n assignment(一) SVM线性分类器
目录
序
原来都是用的c++学习的传统图像分割算法。主要学习聚类分割、水平集、图割,欢迎一起讨论学习。
刚刚开始学习cs231n的课程,正好学习python,也做些实战加深对模型的理解。
1、这是自己的学习笔记,会参考别人的内容,如有侵权请联系删除。
3、有些原理性的内容不会讲解,但是会放上我觉得讲的不错的博客链接
4、由于之前没怎么用过numpy,也对python不熟,所以也是一个python和numpy模块的学习笔记
线性分类器
对于SVM的代价函数的个人理解:公式中的Sj和Syi分别代表第i个样本对应某个标签的得分和第i个样本正确分类的标签得分。从一般角度来说,正确分类的得分越高越好,所以把其他标签的得分和正确分类的标签做差,如果Sj-Syi小于0说明该分类正确并且不需要付出代价,相反的如果差值大于0,那么说明有错误分类,要付出代价。Δ可以认为是一个保险,可以理解成为,正确分类的得分要比其他分类的得分优秀一定的档次,不然就体现不出自身的优秀了,Δ = 1。下图是得分S的计算方式。
Li是每个样本的代价,求和之后加上正则化项,就得到了最后的代价:
为什么要加入正则项呢?假设某条样本Xi在(1,0,0,0)和(0.25,0.25,0.25,0.25)两种权重下的得分是一样的,那么哪一种比较好呢?应该是第二种,因为第一种权重值考虑到了第一个特征,而第二种是4个特征的组合,大概就是这么个理解
图片加载的代码和KNN种的相同,这边只列出SVM代价函数的实现代码。主要分为两部分,一种是朴素实现,一种是向量化的实现。由于没有深入研究SVM的梯度,但是代码中计算了,在朴素实现种根据写法给出了梯度实现方式的猜测。
import numpy as np
def svm_loss_naive(W, X, y, reg):
"""
使用循环实现SVM loss函数
输入维度为D,有C类,我们使用N个样本作为一批输入
输入:
-W:shape(D, C) 权重矩阵 权重矩阵一列作为一组特征
-X: shape(N, D) 数据 输入数据为一行一个样本
-y:shape(N, ) 标签
-reg:float,正则化强度
return: tuple
- 存储为float的loss
- 权重W的梯度,和W大小相同的array
"""
dW = np.zeros(W.shape)
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train): #循环次数为数据的条数
scores = X[i].dot(W) # scores.shape = (1, C),将每一行都乘以权重矩阵,得到对应10个分类的得分
correct_class_score = scores[y[i]] #找到正确的得分
for j in range(num_classes):#循环次数为类别个数
if(j == y[i]):
continue
margin = scores[j] - correct_class_score + 1
if margin > 0:
loss += margin
#猜测:那么就要加大正确列的权重系数,所以是减去X对应行的特征值(因为最后是减dW,所以这边符号是相反的)
#相应的,如果margin大于0,说明损失过大,要减小得分较大列的权重系数,所以是加上对应X的特征值
dW[:, y[i]] += -X[i,:].T
dW[:,j] += X[i,:].T #对应的其他列 加上 对应的数据行
loss /= num_train
dW /= num_train
#加入正则项
loss += reg* np.sum(W * W)
dW += reg * W
return loss,dW
def svm_loss_vectorized(W, X, y, reg):
"""
结构化的SVM损失函数,使用向量实现
"""
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.dot(W) #首先计算得分 scores.shape = (num_classes, num_characters)
#这边用到list,list是一个列表(对应第几个是正确类别)实现对每一行正确分类得分的查找
correct_class_score = scores[range(num_train),list(y)].reshape(-1,1) #correct_class_score.shape = (num_train,1)
margins = np.maximum(0,scores-correct_class_score+1) #broadcast,把得分减去正确得分,加上delta,比较出最大值
margins[range(num_train),list(y)] = 0.0 #继续筛选,正确分类是不需要计算得分差值的
loss = np.sum(margins)/num_train + 0.5*reg*np.sum(W*W) #对margins求和然后求均值,加上正则化项
coeff_mat = np.zeros((num_train,num_classes))
coeff_mat[margins>0] =1 #类似matlab,得到margin中大于0的索引,然后给coeff_mat赋值为1
coeff_mat[range(num_train),list(y)]=0 #将正确类系数肯定为1,所以要清0
coeff_mat[range(num_train),list(y)] = -np.sum(coeff_mat,axis=1) #给正确类加权重,有几个错误分类那就加几个,对应朴素的有几个margin>0就加几次
dW = (X.T).dot(coeff_mat) #dW为训练集矩阵乘以系数矩阵
dW = dW / num_train +reg*W #正则化
return loss,dW
实现之后测试一下是否正确
from svm import svm_loss_naive
import time
#生成标准正态随机权重矩阵
W = np.random.randn(3073,10) * 0.0001
#朴素SVM
loss, grad = svm_loss_naive(W, x_dev, y_dev, 0.00000005)
print('loss: %f' %(loss))
>> loss: 9.249936
#向量化svm损失函数,和朴素svm损失函数的实现做对比
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, x_dev,y_dev,0.000005)
toc= time.time()
print('naive loss: %e computed in %fs' % (loss_naive,toc-tic))
from svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, x_dev,y_dev,0.000005)
toc= time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized,toc-tic))
print('differece: %f' %(loss_naive-loss_vectorized))
>> naive loss: 9.249936e+00 computed in 0.079760s
>> vectorized loss: 9.249936e+00 computed in 0.001997s
>> differece: 0.000000
梯度验证
由于梯度有解析梯度(差分)和数值梯度(公式计算),而在计算过程中我们往往用的是数值梯度。这边没有对解析梯度的实现做出研究,仅仅给出实现。
#梯度验证
import numpy as np
import random
def grad_check_sparse(f, x, analytic_grad, num_checks = 10, h = 1e-5):
"""
暂时没有研究
"""
for i in range(num_checks):
ix = tuple([random.randrange(m) for m in x.shape])
oldval = x[ix]
x[ix] = oldval + h
fxph = f(x)
x[ix] = oldval -h
fxmh = f(x)
x[ix] = oldval
grad_numerical = (fxph - fxmh)/(2*h)
grad_analytic = analytic_grad[ix]
rel_error = abs(grad_numerical - grad_analytic)/(abs(grad_numerical) + abs(grad_analytic))
print('numerical: %f analytic: %f, relative error: %e' % (grad_numerical,grad_analytic,rel_error))
#梯度验证,暂时没有研究
from gradient_check import grad_check_sparse
loss, grad = svm_loss_naive(W,x_dev,y_dev,0.0)
f = lambda w:svm_loss_naive(w,x_dev,y_dev,0.0)[0]
grad_numerical = grad_check_sparse(f,W,grad)
print('trun on reg')
loss, grad = svm_loss_naive(W,x_dev,y_dev,5e1)
f = lambda w:svm_loss_naive(w,x_dev,y_dev,5e1)[0]
grad_numerical = grad_check_sparse(f,W,grad)
"""
结果
numerical: 25.439111 analytic: 25.439111, relative error: 1.731422e-11
numerical: 0.309849 analytic: 0.309849, relative error: 9.674500e-12
numerical: 13.060349 analytic: 13.060349, relative error: 3.524396e-12
numerical: -48.033999 analytic: -48.002623, relative error: 3.267111e-04
numerical: -0.675017 analytic: -0.675017, relative error: 3.684367e-10
numerical: -7.155280 analytic: -7.155280, relative error: 1.000791e-12
numerical: -19.637627 analytic: -19.639164, relative error: 3.913276e-05
numerical: -11.459053 analytic: -11.459053, relative error: 2.528526e-11
numerical: -4.868211 analytic: -4.868211, relative error: 9.275389e-11
numerical: -10.539809 analytic: -10.539809, relative error: 3.337353e-11
trun on reg
numerical: -5.305563 analytic: -5.306339, relative error: 7.308434e-05
numerical: -16.283510 analytic: -16.274703, relative error: 2.705005e-04
numerical: 25.670647 analytic: 25.668583, relative error: 4.020914e-05
numerical: -14.607615 analytic: -14.611633, relative error: 1.375160e-04
numerical: 30.767250 analytic: 30.775614, relative error: 1.359065e-04
numerical: 5.315075 analytic: 5.308213, relative error: 6.458990e-04
numerical: 4.796955 analytic: 4.799774, relative error: 2.937392e-04
numerical: 20.242400 analytic: 20.249001, relative error: 1.630065e-04
numerical: 18.623197 analytic: 18.627767, relative error: 1.226878e-04
numerical: 1.316207 analytic: 1.325022, relative error: 3.337659e-03
"""
模型建立与SGD
梯度、损失计算完成之后,对线性分类的模型进行了建立。利用了随机梯度下降。以往的梯度下降运行时直接将所有数据输入然后运算;随机梯度下降简单理解来说,每次从数据集中随机选取mini-batch数据来进行训练。
import numpy as np
from svm import *
class LinearClassifier:
def __init__(self):
self.W = None
def train(self, X, y, learning_rate = 1e-3, reg= 1e-5, num_iters = 100, batch_size =200,verbose = False):
num_train = X.shape[0] #样本数
dim = X.shape[1] #特征维度
# assume y takes values 0...K-1 where K is number of classes
num_classes = np.max(y) + 1 #类别数,从0开始数,所以加一
if self.W is None:
# lazily initialize W
self.W = 0.001 * np.random.randn(dim, num_classes) # 初始化W
# Run stochastic gradient descent(Mini-Batch) to optimize W
loss_history = []
for it in range(num_iters): #每次随机取batch的数据来进行梯度下降
X_batch = None
y_batch = None
# Sampling with replacement is faster than sampling without replacement.
batch_idx = np.random.choice(num_train, batch_size, replace=False)
X_batch = X[batch_idx, :] # batch_size by D
y_batch = y[batch_idx] # 1 by batch_size
# evaluate loss and gradient
#子类调用的是子类的loss方法
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
# perform parameter update 梯度下降
self.W += -learning_rate * grad
if verbose and it % 100 == 0:
print('Iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
#预测
def predict(self,X):
scores = X.dot(self.W) #计算得分 scores.shape = (num_train,num_classes)
y_pred = np.argmax(scores,axis =1) #返回最大数的索引
return y_pred
def loss(self,X_batch,y_batch,reg):
pass
class LinearSVM(LinearClassifier):
def loss(self, X_batch,y_batch,reg):
return svm_loss_vectorized(self.W,X_batch,y_batch,reg)
有了模型就进行一次测试
from linear_classifier import LinearSVM
import time
svm = LinearSVM() #创建对象,此时W为空
tic = time.time()
loss_hist = svm.train(x_train,y_train,learning_rate = 1e-7,reg = 2.5e4,num_iters = 1500,verbose = True) #此时svm对象中有W
toc = time.time()
print('that took %fs' % (toc -tic))
#输出
"""
Iteration 0 / 1500: loss 782.861021
Iteration 100 / 1500: loss 467.581705
Iteration 200 / 1500: loss 282.545110
Iteration 300 / 1500: loss 171.509829
Iteration 400 / 1500: loss 106.321753
Iteration 500 / 1500: loss 65.678160
Iteration 600 / 1500: loss 41.887783
Iteration 700 / 1500: loss 27.289861
Iteration 800 / 1500: loss 18.752318
Iteration 900 / 1500: loss 12.986854
Iteration 1000 / 1500: loss 10.119960
Iteration 1100 / 1500: loss 8.384713
Iteration 1200 / 1500: loss 6.967093
Iteration 1300 / 1500: loss 6.702494
Iteration 1400 / 1500: loss 6.429761
that took 31.168027s
"""
梯度曲线显示:
plt.plot(loss_hist)
plt.xlabel('iteration number')
plt.ylabel('loss value')
plt.show()
验证集验证与超参数调优(交叉验证)
在拿到一组数据时一般分为训练集,开发集(验证集),测试集。训练和测试集都知道是干吗的,验证集在除了做验证训练结果外,还可以做超参数调优,寻找最优模型。遍历每一种参数组合,训练SVM模型,然后在验证集上测试,寻找验证集上准确率最高的模型。交叉验证很费时间,下面的代码总共循环18次,在9700KCPU上满载运行了十多分钟,自己选择是否运行该段代码。
展示部分代码运行结果:可以看出最高的准确率为0.4,比之前KNN的准确率要高很多了,保存该模型对象用于测试集测试。
#超参数调优(交叉验证)
learning_rates=[1.4e-7,1.5e-7,1.6e-7]
#for循环的简化写法12个
regularization_strengths= [(1+i*0.1)*1e4 for i in range(-3,3)]+ [(2+i*0.1)*1e4 for i in range(-3,3)]
results = {} #字典
best_val = -1
best_svm =None
for learning in learning_rates: #循环3次
for regularization in regularization_strengths: #循环6次
svm = LinearSVM()
svm.train(x_train,y_train,learning_rate=learning,reg=regularization,num_iters=2000) #训练
y_train_pred = svm.predict(x_train) #预测(训练集)
train_accuracy = np.mean(y_train==y_train_pred)
print('training accuracy: %f' % train_accuracy)
y_val_pred = svm.predict(x_val) #预测(验证集)
val_accuracy = np.mean(y_val==y_val_pred)
print('validation accuracy: %f' % val_accuracy)
if val_accuracy>best_val:
best_val = val_accuracy
best_svm= svm
results[(learning,regularization)] = (train_accuracy,val_accuracy)
for lr,reg in sorted(results):
train_accuracy,val_accuracy = results[(lr,reg)]
print('lr %e reg %e train accuracy: %f val accuracy: %f ' % (lr,reg,train_accuracy,val_accuracy))
print('best validation accuracy achieved during cross-validation: %f' % best_val)
作业上还有可视化交叉验证的部分,由于不明真相,所以省略研究。 (贴上代码,有兴趣的自己研究)
#可视化交叉验证结果
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]
sz = [results[x][0]*1500 for x in results]
plt.subplot(1,2,1)
plt.scatter(x_scatter,y_scatter,sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('cifar10 training accuracy')
sz = [results[x][1]*1500 for x in results]
plt.subplot(1,2,2)
plt.scatter(x_scatter,y_scatter,sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('cifar10 training accuracy')
plt.show()
测试集测试与权重可视化
#测试集测试
y_test_pred = best_svm.predict(x_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear svm on raw pixels final test set accuracy: %f'% test_accuracy)
>> linear svm on raw pixels final test set accuracy: 0.384000
#权重可视化
w = best_svm.W[:-1, :] #将偏置分离出来 -1应该是倒数第二行
w = w.reshape(32,32,3,10) #根据w.shape = (特征数,分类数) #绕晕了
w_min,w_max = np.min(w),np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
plt.subplot(2,5,i+1)
wimg = 255.0*(w[:,:,:,i].squeeze() - w_min)/(w_max-w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])
从得出的可视化权重图像来看,有点类似模板匹配。