cs231n assignment(一) SVM线性分类器

目录

线性分类器

梯度验证

模型建立与SGD

验证集验证与超参数调优(交叉验证)

测试集测试与权重可视化


       原来都是用的c++学习的传统图像分割算法。主要学习聚类分割、水平集、图割,欢迎一起讨论学习。

       刚刚开始学习cs231n的课程,正好学习python,也做些实战加深对模型的理解。

       课程链接

       1、这是自己的学习笔记,会参考别人的内容,如有侵权请联系删除。

       2、代码参考WILL 、杜克,但是有了很多自己的学习注释

       3、有些原理性的内容不会讲解,但是会放上我觉得讲的不错的博客链接

       4、由于之前没怎么用过numpy,也对python不熟,所以也是一个python和numpy模块的学习笔记


线性分类器

cs231n assignment(一) SVM线性分类器

        对于SVM的代价函数的个人理解:公式中的Sj和Syi分别代表第i个样本对应某个标签的得分和第i个样本正确分类的标签得分。从一般角度来说,正确分类的得分越高越好,所以把其他标签的得分和正确分类的标签做差,如果Sj-Syi小于0说明该分类正确并且不需要付出代价,相反的如果差值大于0,那么说明有错误分类,要付出代价。Δ可以认为是一个保险,可以理解成为,正确分类的得分要比其他分类的得分优秀一定的档次,不然就体现不出自身的优秀了,Δ = 1。下图是得分S的计算方式。

cs231n assignment(一) SVM线性分类器

        Li是每个样本的代价,求和之后加上正则化项,就得到了最后的代价:

        为什么要加入正则项呢?假设某条样本Xi在(1,0,0,0)和(0.25,0.25,0.25,0.25)两种权重下的得分是一样的,那么哪一种比较好呢?应该是第二种,因为第一种权重值考虑到了第一个特征,而第二种是4个特征的组合,大概就是这么个理解

cs231n assignment(一) SVM线性分类器

       图片加载的代码和KNN种的相同,这边只列出SVM代价函数的实现代码。主要分为两部分,一种是朴素实现,一种是向量化的实现。由于没有深入研究SVM的梯度,但是代码中计算了,在朴素实现种根据写法给出了梯度实现方式的猜测。

import numpy as np

def svm_loss_naive(W, X, y, reg):
    """
    使用循环实现SVM loss函数
    输入维度为D,有C类,我们使用N个样本作为一批输入
    输入:
    -W:shape(D, C) 权重矩阵  权重矩阵一列作为一组特征
    -X: shape(N, D) 数据      输入数据为一行一个样本
    -y:shape(N, )  标签 
    -reg:float,正则化强度 
    
    return: tuple 
    - 存储为float的loss 
    - 权重W的梯度,和W大小相同的array 
    """
    dW = np.zeros(W.shape)
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    for i in range(num_train):      #循环次数为数据的条数
        scores = X[i].dot(W)        # scores.shape = (1, C),将每一行都乘以权重矩阵,得到对应10个分类的得分
        correct_class_score = scores[y[i]]  #找到正确的得分
        for j in range(num_classes):#循环次数为类别个数
            if(j == y[i]):
                continue
            margin = scores[j] - correct_class_score + 1
            if margin > 0:
                loss += margin
                #猜测:那么就要加大正确列的权重系数,所以是减去X对应行的特征值(因为最后是减dW,所以这边符号是相反的)
                #相应的,如果margin大于0,说明损失过大,要减小得分较大列的权重系数,所以是加上对应X的特征值
                dW[:, y[i]] += -X[i,:].T     
                dW[:,j] += X[i,:].T          #对应的其他列 加上 对应的数据行
    loss /= num_train
    dW /= num_train
    
    #加入正则项
    loss += reg* np.sum(W * W)
    dW += reg * W
    
    return loss,dW

def svm_loss_vectorized(W, X, y, reg):
    """
    结构化的SVM损失函数,使用向量实现 
    """
   
    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)        #首先计算得分  scores.shape = (num_classes, num_characters)
    #这边用到list,list是一个列表(对应第几个是正确类别)实现对每一行正确分类得分的查找
    correct_class_score = scores[range(num_train),list(y)].reshape(-1,1) #correct_class_score.shape = (num_train,1)
    margins = np.maximum(0,scores-correct_class_score+1)    #broadcast,把得分减去正确得分,加上delta,比较出最大值
    margins[range(num_train),list(y)] = 0.0     #继续筛选,正确分类是不需要计算得分差值的
    loss = np.sum(margins)/num_train + 0.5*reg*np.sum(W*W)   #对margins求和然后求均值,加上正则化项
    
    coeff_mat = np.zeros((num_train,num_classes))
    coeff_mat[margins>0] =1    #类似matlab,得到margin中大于0的索引,然后给coeff_mat赋值为1
    coeff_mat[range(num_train),list(y)]=0    #将正确类系数肯定为1,所以要清0
    coeff_mat[range(num_train),list(y)] = -np.sum(coeff_mat,axis=1)   #给正确类加权重,有几个错误分类那就加几个,对应朴素的有几个margin>0就加几次    
    dW = (X.T).dot(coeff_mat)    #dW为训练集矩阵乘以系数矩阵
    dW = dW / num_train +reg*W   #正则化
    return loss,dW

实现之后测试一下是否正确

from svm import svm_loss_naive
import time
#生成标准正态随机权重矩阵
W = np.random.randn(3073,10) * 0.0001
#朴素SVM 
loss, grad = svm_loss_naive(W, x_dev, y_dev, 0.00000005)
print('loss: %f' %(loss))

 >> loss: 9.249936

#向量化svm损失函数,和朴素svm损失函数的实现做对比
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, x_dev,y_dev,0.000005)
toc= time.time()
print('naive loss: %e computed in %fs' % (loss_naive,toc-tic))

from svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, x_dev,y_dev,0.000005)
toc= time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized,toc-tic))

print('differece: %f' %(loss_naive-loss_vectorized))

>> naive loss: 9.249936e+00 computed in 0.079760s

>> vectorized loss: 9.249936e+00 computed in 0.001997s

>> differece: 0.000000


梯度验证

       由于梯度有解析梯度(差分)和数值梯度(公式计算),而在计算过程中我们往往用的是数值梯度。这边没有对解析梯度的实现做出研究,仅仅给出实现。

#梯度验证
import numpy as np
import random
def grad_check_sparse(f, x, analytic_grad, num_checks = 10, h = 1e-5):
    """
    暂时没有研究
    """
    for i in range(num_checks):
        ix = tuple([random.randrange(m) for m in x.shape])
        
        oldval = x[ix]
        x[ix] = oldval + h
        fxph = f(x)
        x[ix] = oldval -h
        fxmh = f(x)
        x[ix] = oldval
        
        grad_numerical = (fxph - fxmh)/(2*h)
        grad_analytic = analytic_grad[ix]
        rel_error = abs(grad_numerical - grad_analytic)/(abs(grad_numerical) + abs(grad_analytic))
        print('numerical: %f analytic: %f, relative error: %e' % (grad_numerical,grad_analytic,rel_error))
#梯度验证,暂时没有研究
from gradient_check import grad_check_sparse
loss, grad = svm_loss_naive(W,x_dev,y_dev,0.0)
f = lambda w:svm_loss_naive(w,x_dev,y_dev,0.0)[0]
grad_numerical = grad_check_sparse(f,W,grad)
print('trun on reg')
loss, grad = svm_loss_naive(W,x_dev,y_dev,5e1)
f = lambda w:svm_loss_naive(w,x_dev,y_dev,5e1)[0]
grad_numerical = grad_check_sparse(f,W,grad)

"""
结果
numerical: 25.439111 analytic: 25.439111, relative error: 1.731422e-11
numerical: 0.309849 analytic: 0.309849, relative error: 9.674500e-12
numerical: 13.060349 analytic: 13.060349, relative error: 3.524396e-12
numerical: -48.033999 analytic: -48.002623, relative error: 3.267111e-04
numerical: -0.675017 analytic: -0.675017, relative error: 3.684367e-10
numerical: -7.155280 analytic: -7.155280, relative error: 1.000791e-12
numerical: -19.637627 analytic: -19.639164, relative error: 3.913276e-05
numerical: -11.459053 analytic: -11.459053, relative error: 2.528526e-11
numerical: -4.868211 analytic: -4.868211, relative error: 9.275389e-11
numerical: -10.539809 analytic: -10.539809, relative error: 3.337353e-11
trun on reg
numerical: -5.305563 analytic: -5.306339, relative error: 7.308434e-05
numerical: -16.283510 analytic: -16.274703, relative error: 2.705005e-04
numerical: 25.670647 analytic: 25.668583, relative error: 4.020914e-05
numerical: -14.607615 analytic: -14.611633, relative error: 1.375160e-04
numerical: 30.767250 analytic: 30.775614, relative error: 1.359065e-04
numerical: 5.315075 analytic: 5.308213, relative error: 6.458990e-04
numerical: 4.796955 analytic: 4.799774, relative error: 2.937392e-04
numerical: 20.242400 analytic: 20.249001, relative error: 1.630065e-04
numerical: 18.623197 analytic: 18.627767, relative error: 1.226878e-04
numerical: 1.316207 analytic: 1.325022, relative error: 3.337659e-03
"""

模型建立与SGD

       梯度、损失计算完成之后,对线性分类的模型进行了建立。利用了随机梯度下降。以往的梯度下降运行时直接将所有数据输入然后运算;随机梯度下降简单理解来说,每次从数据集中随机选取mini-batch数据来进行训练。

import numpy as np
from svm import *

class LinearClassifier:
    def __init__(self):
        self.W = None
    
    def train(self, X, y, learning_rate = 1e-3, reg= 1e-5, num_iters = 100, batch_size =200,verbose = False):
        num_train = X.shape[0]        #样本数
        dim = X.shape[1]              #特征维度
        # assume y takes values 0...K-1 where K is number of classes
        num_classes = np.max(y) + 1   #类别数,从0开始数,所以加一
        
        if self.W is None:
            # lazily initialize W
            self.W = 0.001 * np.random.randn(dim, num_classes)   # 初始化W

        # Run stochastic gradient descent(Mini-Batch) to optimize W
        loss_history = []
        for it in range(num_iters):  #每次随机取batch的数据来进行梯度下降
            X_batch = None
            y_batch = None
            # Sampling with replacement is faster than sampling without replacement.
            batch_idx = np.random.choice(num_train, batch_size, replace=False)
            X_batch = X[batch_idx, :]   # batch_size by D
            y_batch = y[batch_idx]      # 1 by batch_size
            # evaluate loss and gradient
            #子类调用的是子类的loss方法
            loss, grad = self.loss(X_batch, y_batch, reg)
            loss_history.append(loss)

            # perform parameter update  梯度下降
            self.W += -learning_rate * grad
            if verbose and it % 100 == 0:
                print('Iteration %d / %d: loss %f' % (it, num_iters, loss))

        return loss_history
    
    #预测
    def predict(self,X):
        scores = X.dot(self.W)        #计算得分  scores.shape = (num_train,num_classes)
        y_pred = np.argmax(scores,axis =1)   #返回最大数的索引
        return y_pred
    
    def loss(self,X_batch,y_batch,reg):
        pass
    
    
class LinearSVM(LinearClassifier):
    def loss(self, X_batch,y_batch,reg):
        return svm_loss_vectorized(self.W,X_batch,y_batch,reg)

 有了模型就进行一次测试

from linear_classifier import LinearSVM
import time
svm = LinearSVM()    #创建对象,此时W为空
tic = time.time()
loss_hist = svm.train(x_train,y_train,learning_rate = 1e-7,reg = 2.5e4,num_iters = 1500,verbose = True)    #此时svm对象中有W
toc = time.time()
print('that took %fs' % (toc -tic))

#输出
"""
Iteration 0 / 1500: loss 782.861021
Iteration 100 / 1500: loss 467.581705
Iteration 200 / 1500: loss 282.545110
Iteration 300 / 1500: loss 171.509829
Iteration 400 / 1500: loss 106.321753
Iteration 500 / 1500: loss 65.678160
Iteration 600 / 1500: loss 41.887783
Iteration 700 / 1500: loss 27.289861
Iteration 800 / 1500: loss 18.752318
Iteration 900 / 1500: loss 12.986854
Iteration 1000 / 1500: loss 10.119960
Iteration 1100 / 1500: loss 8.384713
Iteration 1200 / 1500: loss 6.967093
Iteration 1300 / 1500: loss 6.702494
Iteration 1400 / 1500: loss 6.429761
that took 31.168027s
"""

     梯度曲线显示:

plt.plot(loss_hist)
plt.xlabel('iteration number')
plt.ylabel('loss value')
plt.show()

cs231n assignment(一) SVM线性分类器


验证集验证与超参数调优(交叉验证)

       在拿到一组数据时一般分为训练集,开发集(验证集),测试集。训练和测试集都知道是干吗的,验证集在除了做验证训练结果外,还可以做超参数调优,寻找最优模型。遍历每一种参数组合,训练SVM模型,然后在验证集上测试,寻找验证集上准确率最高的模型。交叉验证很费时间,下面的代码总共循环18次,在9700KCPU上满载运行了十多分钟,自己选择是否运行该段代码。

展示部分代码运行结果:可以看出最高的准确率为0.4,比之前KNN的准确率要高很多了,保存该模型对象用于测试集测试。

cs231n assignment(一) SVM线性分类器

#超参数调优(交叉验证)
learning_rates=[1.4e-7,1.5e-7,1.6e-7]
#for循环的简化写法12个
regularization_strengths= [(1+i*0.1)*1e4 for i in range(-3,3)]+ [(2+i*0.1)*1e4 for i in range(-3,3)]   
results = {}   #字典
best_val = -1
best_svm =None
for learning in learning_rates:    #循环3次
    for regularization in regularization_strengths:     #循环6次
        svm = LinearSVM()
        svm.train(x_train,y_train,learning_rate=learning,reg=regularization,num_iters=2000)   #训练
        y_train_pred = svm.predict(x_train)   #预测(训练集)
        train_accuracy = np.mean(y_train==y_train_pred)
        print('training accuracy: %f' % train_accuracy)
        y_val_pred = svm.predict(x_val)    #预测(验证集)
        val_accuracy = np.mean(y_val==y_val_pred)
        print('validation accuracy: %f' % val_accuracy)
        
        if val_accuracy>best_val:
            best_val = val_accuracy
            best_svm= svm
        results[(learning,regularization)] = (train_accuracy,val_accuracy)
for lr,reg in sorted(results):
    train_accuracy,val_accuracy = results[(lr,reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f ' % (lr,reg,train_accuracy,val_accuracy))
print('best validation accuracy achieved during cross-validation: %f' % best_val)

作业上还有可视化交叉验证的部分,由于不明真相,所以省略研究。 (贴上代码,有兴趣的自己研究)

#可视化交叉验证结果
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

sz = [results[x][0]*1500 for x in results]
plt.subplot(1,2,1)
plt.scatter(x_scatter,y_scatter,sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('cifar10 training accuracy')

sz = [results[x][1]*1500 for x in results]
plt.subplot(1,2,2)
plt.scatter(x_scatter,y_scatter,sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('cifar10 training accuracy')
plt.show()

cs231n assignment(一) SVM线性分类器


测试集测试与权重可视化

#测试集测试
y_test_pred = best_svm.predict(x_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear svm on raw pixels final test set accuracy: %f'% test_accuracy)

>> linear svm on raw pixels final test set accuracy: 0.384000

#权重可视化
w = best_svm.W[:-1, :]   #将偏置分离出来  -1应该是倒数第二行
w = w.reshape(32,32,3,10)   #根据w.shape = (特征数,分类数)  #绕晕了
w_min,w_max = np.min(w),np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2,5,i+1)
    wimg = 255.0*(w[:,:,:,i].squeeze() - w_min)/(w_max-w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

cs231n assignment(一) SVM线性分类器

 从得出的可视化权重图像来看,有点类似模板匹配。