Fine-tuning a Pretrained Network for Style Recognition
本文来源:https://github.com/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb
由于数据资源和计算资源的限制,我们往往很难从头开始训练神经网络。现在比较常用的解决方案是迁移在大规模数据集上训练好的网络的权值来精调我们自己的网络。(这种方法要求两种数据集有一定的相似性)这个方法的优点就是,提前训练好的网络是在一个非常大的图像数据集中学习到的,网络前端的卷积层可以有效捕获一般视觉表象的语义信息。因此,在小数据集目标任务中,可以复制这些卷积层过来提取底层特征,然后加上全连接层和分类器,finetuning后能获得不错的分类结果。
具体的实现过程如下:
1、根据网络输入数据的格式定义图像处理函数,包括修改图像大小,RGB通道的顺序等
2、下载数据集和pre-trained模型
- get_ilsvrc_aux.sh去下载ImageNet数据均值和标签等
- download_model_binary.py去下载提前训练好的参考模型
- finetune_flickr_style/assemble_data.py下载style训练和测试数据
下载完成后,设置相关路径,包括预训练模型和数据的存储路径
从ilsvrc12/synset_words.txt
加载1000幅ImageNet标签,并且从finetune_flickr_style/style.txt加载自己的数据集类别(txt中存储的是数据的几种类别,几个名词而已)
3、搭建网络模型
我们开始通过定义caffenet
,一个函数来初始化CaffeNet结构(一个在AlexNet的小的变种),以参数指定数据和输出类的数量。这里需要改变全连接层的名称,因为迁移预训练模型参数是按照层的名称进行的,一一对应。
定义一个函数style_net
这会调用caffenet
。通过subset可以选择训练集或测试集,进行不同的任务。
这个新的网络也会有CaffeNet的结构,输入和输出不同。
- 输入是我们下载的数据集,被一个ImageData层所输入。
- 输出是一个20类的分布而不是原始1000个imageNet类
- 分类层被重新命名为
fc8_flickr
代替fc8
,告诉Caffe不是从预训练的ImageNet模型加载原始的分类(fc8)权值。
4、训练网络
from caffe.proto import caffe_pb2 def solver(train_net_path, test_net_path=None, base_lr=0.001): s = caffe_pb2.SolverParameter() # Specify locations of the train and (maybe) test networks. s.train_net = train_net_path if test_net_path is not None: s.test_net.append(test_net_path) s.test_interval = 1000 # Test after every 1000 training iterations. s.test_iter.append(100) # Test on 100 batches each time we test. # The number of iterations over which to average the gradient. # Effectively boosts the training batch size by the given factor, without # affecting memory utilization. s.iter_size = 1 s.max_iter = 100000 # # of times to update the net (training iterations) # Solve using the stochastic gradient descent (SGD) algorithm. # Other choices include 'Adam' and 'RMSProp'. s.type = 'SGD' # Set the initial learning rate for SGD. s.base_lr = base_lr # Set `lr_policy` to define how the learning rate changes during training. # Here, we 'step' the learning rate by multiplying it by a factor `gamma` # every `stepsize` iterations. s.lr_policy = 'step' s.gamma = 0.1 s.stepsize = 20000 # Set other SGD hyperparameters. Setting a non-zero `momentum` takes a # weighted average of the current gradient and previous gradients to make # learning more stable. L2 weight decay regularizes learning, to help prevent # the model from overfitting. s.momentum = 0.9 s.weight_decay = 5e-4 # Display the current training loss and accuracy every 1000 iterations. s.display = 1000 # Snapshots are files used to store networks we've trained. Here, we'll # snapshot every 10K iterations -- ten times during training. s.snapshot = 10000 s.snapshot_prefix = caffe_root + 'models/finetune_flickr_style/finetune_flickr_style' # Train on the GPU. Using the CPU to train large networks is very slow. s.solver_mode = caffe_pb2.SolverParameter.GPU # Write the solver to a temporary file and return its filename. with tempfile.NamedTemporaryFile(delete=False) as f: f.write(str(s)) return f.name
你也可以直接用命令行进行训练,不过需要配置prototxt文件,而且将数据转换成caffe需要的形式,例如leveldb
build/tools/caffe train \ -solver models/finetune_flickr_style/solver.prototxt \ -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \ -gpu 0
不过我们用python训练
def run_solvers(niter, solvers, disp_interval=10): """Run solvers for niter iterations, returning the loss and accuracy recorded each iteration. `solvers` is a list of (name, solver) tuples.""" blobs = ('loss', 'acc') loss, acc = ({name: np.zeros(niter) for name, _ in solvers} for _ in blobs) for it in range(niter): for name, s in solvers: s.step(1) # run a single SGD step in Caffe loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy() for b in blobs) if it % disp_interval == 0 or it + 1 == niter: loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' % (n, loss[n][it], np.round(100*acc[n][it])) for n, _ in solvers) print '%3d) %s' % (it, loss_disp) # Save the learned weights from both nets. weight_dir = tempfile.mkdtemp() weights = {} for name, s in solvers: filename = 'weights.%s.caffemodel' % name weights[name] = os.path.join(weight_dir, filename) s.net.save(weights[name]) return loss, acc, weights
niter = 200 # number of iterations to train # Reset style_solver as before. style_solver_filename = solver(style_net(train=True)) style_solver = caffe.get_solver(style_solver_filename) style_solver.net.copy_from(weights) # For reference, we also create a solver that isn't initialized from # the pretrained ImageNet weights. scratch_style_solver_filename = solver(style_net(train=True)) scratch_style_solver = caffe.get_solver(scratch_style_solver_filename) print 'Running solvers for %d iterations...' % niter solvers = [('pretrained', style_solver), ('scratch', scratch_style_solver)] loss, acc, weights = run_solvers(niter, solvers) print 'Done.' train_loss, scratch_train_loss = loss['pretrained'], loss['scratch'] train_acc, scratch_train_acc = acc['pretrained'], acc['scratch'] style_weights, scratch_style_weights = weights['pretrained'], weights['scratch'] # Delete solvers to save memory. del style_solver, scratch_style_solver, solvers 执行python文件
Running solvers for 200 iterations... 0) pretrained: loss=1.609, acc=28%; scratch: loss=1.609, acc=28% 10) pretrained: loss=1.293, acc=52%; scratch: loss=1.626, acc=14% 20) pretrained: loss=1.110, acc=56%; scratch: loss=1.646, acc=10% 30) pretrained: loss=1.084, acc=60%; scratch: loss=1.616, acc=20% 40) pretrained: loss=0.898, acc=64%; scratch: loss=1.588, acc=26% 50) pretrained: loss=1.024, acc=54%; scratch: loss=1.607, acc=32% 60) pretrained: loss=0.925, acc=66%; scratch: loss=1.616, acc=20% 70) pretrained: loss=0.861, acc=74%; scratch: loss=1.598, acc=24% 80) pretrained: loss=0.967, acc=60%; scratch: loss=1.588, acc=30% 90) pretrained: loss=1.274, acc=52%; scratch: loss=1.608, acc=20% 100) pretrained: loss=1.113, acc=62%; scratch: loss=1.588, acc=30% 110) pretrained: loss=0.922, acc=62%; scratch: loss=1.578, acc=36% 120) pretrained: loss=0.918, acc=62%; scratch: loss=1.599, acc=20% 130) pretrained: loss=0.959, acc=58%; scratch: loss=1.594, acc=22% 140) pretrained: loss=1.228, acc=50%; scratch: loss=1.608, acc=14% 150) pretrained: loss=0.727, acc=76%; scratch: loss=1.623, acc=16% 160) pretrained: loss=1.074, acc=66%; scratch: loss=1.607, acc=20% 170) pretrained: loss=0.887, acc=60%; scratch: loss=1.614, acc=20% 180) pretrained: loss=0.961, acc=62%; scratch: loss=1.614, acc=18% 190) pretrained: loss=0.737, acc=76%; scratch: loss=1.613, acc=18% 199) pretrained: loss=0.836, acc=70%; scratch: loss=1.614, acc=16% Done.
画图观察两种方式的loss和accuracy曲线随迭代次数的变化
plot(np.vstack([train_loss, scratch_train_loss]).T) xlabel('Iteration #') ylabel('Loss')
plot(np.vstack([train_acc, scratch_train_acc]).T) xlabel('Iteration #') ylabel('Accuracy')
5、训练完成后,测试
def eval_style_net(weights, test_iters=10): test_net = caffe.Net(style_net(train=False), weights, caffe.TEST) accuracy = 0 for it in xrange(test_iters): accuracy += test_net.forward()['acc'] accuracy /= test_iters return test_net, accuracy
test_net, accuracy = eval_style_net(style_weights) print 'Accuracy, trained from ImageNet initialization: %3.1f%%' % (100*accuracy, ) scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights) print 'Accuracy, trained from random initialization: %3.1f%%' % (100*scratch_accuracy, ) 实验还提到了用end-to-end的方法进行对比实验,该方法不再固定前端特征提取层的参数,网络所有参数都进行训练,结果有了很大的提升,不过我感觉这不一定适用其他实验。
Running solvers for 200 iterations... 0) pretrained, end-to-end: loss=0.781, acc=64%; scratch, end-to-end: loss=1.585, acc=28% 10) pretrained, end-to-end: loss=1.178, acc=62%; scratch, end-to-end: loss=1.638, acc=14% 20) pretrained, end-to-end: loss=1.084, acc=60%; scratch, end-to-end: loss=1.637, acc= 8% 30) pretrained, end-to-end: loss=0.902, acc=76%; scratch, end-to-end: loss=1.600, acc=20% 40) pretrained, end-to-end: loss=0.865, acc=64%; scratch, end-to-end: loss=1.574, acc=26% 50) pretrained, end-to-end: loss=0.888, acc=60%; scratch, end-to-end: loss=1.604, acc=26% 60) pretrained, end-to-end: loss=0.538, acc=78%; scratch, end-to-end: loss=1.555, acc=34% 70) pretrained, end-to-end: loss=0.717, acc=72%; scratch, end-to-end: loss=1.563, acc=30% 80) pretrained, end-to-end: loss=0.695, acc=74%; scratch, end-to-end: loss=1.502, acc=42% 90) pretrained, end-to-end: loss=0.708, acc=68%; scratch, end-to-end: loss=1.523, acc=26% 100) pretrained, end-to-end: loss=0.432, acc=78%; scratch, end-to-end: loss=1.500, acc=38% 110) pretrained, end-to-end: loss=0.611, acc=78%; scratch, end-to-end: loss=1.618, acc=18% 120) pretrained, end-to-end: loss=0.610, acc=76%; scratch, end-to-end: loss=1.473, acc=30% 130) pretrained, end-to-end: loss=0.471, acc=78%; scratch, end-to-end: loss=1.488, acc=26% 140) pretrained, end-to-end: loss=0.500, acc=76%; scratch, end-to-end: loss=1.514, acc=38% 150) pretrained, end-to-end: loss=0.476, acc=80%; scratch, end-to-end: loss=1.452, acc=46% 160) pretrained, end-to-end: loss=0.368, acc=82%; scratch, end-to-end: loss=1.419, acc=34% 170) pretrained, end-to-end: loss=0.556, acc=76%; scratch, end-to-end: loss=1.583, acc=36% 180) pretrained, end-to-end: loss=0.574, acc=72%; scratch, end-to-end: loss=1.556, acc=22% 190) pretrained, end-to-end: loss=0.360, acc=88%; scratch, end-to-end: loss=1.429, acc=44% 199) pretrained, end-to-end: loss=0.458, acc=78%; scratch, end-to-end: loss=1.370, acc=44% Done. 我们还可以具体观察图片预测结果
plt.imshow(deprocess_net_image(image)) disp_style_preds(test_net, image)
结果:
top 5 predicted style labels = (1) 55.67% Melancholy (2) 27.21% HDR (3) 16.46% Pastel (4) 0.63% Detailed (5) 0.03% Noir
batch_index = 1 image = test_net.blobs['data'].data[batch_index] plt.imshow(deprocess_net_image(image)) print 'actual label =', style_labels[int(test_net.blobs['label'].data[batch_index])] 结果: