Weakly supervised object recognition with convolutional neural networks 论文解读
1. Model Overview
1.1 什么是weakly supervised learning
https://stackoverflow.com/questions/18944805/what-is-weakly-supervised-learning-bootstrapping
In short: In weakly supervised learning, you use a limited amount of labeled data.
也就是说,样本的label不好,怎么不好呢?不充分,只对应一部分,可能错误,都算。
比如对于图片,基本都是人工标记的,这里面就可能出现错误,标注不全
2. Network architecture
总体来说是,5个卷积层,4个全联接层
To adapt this architecture to weakly supervised learning we introduce the following three modifications. First, we treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input. Second, we explicitly search for the highest scoring object position in the image by adding a single global max-pooling layer at the out- put. Third, we use a cost function that can explicitly model multiple objects present in the image.
暂时不明白,往下看
2.1 Convolutional adaptation layers
目的:treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input
文章中提到了这篇论文:
《Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks》
这篇论文的主要思想是,迁移学习,也就是transfer learning,将用于大数据的训练好的网络用于别的任务。
Transfer learning aims to transfer knowledge between related source and target domains
就像论文中说的:
To address this problem, we propose to transfer image representations learned with CNNs on large datasets to other visual recognition tasks with limited training data.
但是原分类问题中的targets和现在的分类问题的targets不一样,于是,作者将原来网络层最后的输出层替换为FCa,FCb。(我理解就是,如果直接从头训练一个深层网络,由于参数过多,数据过少,导致训练结果不好,现在将网络层在一个大数据集训练好后,去掉最后的输出层,加两个全联接层,那么网络中的参数就是这两层的参数–也就减少了参数)
In order to achieve the transfer, we remove the output layer FC8 of the pre-trained network and add an adaptation layer formed by two fully connected layers FCa and FCb
那么如何使得我们自己定义的两层 e.g. Adaptation layer 更好地适应源与当前问题的不同呢?
we train the adaptation layer using a procedure inspired by training slid- ing window object detectors (e.g. [A discriminatively trained, multiscale, deformable part model]) described next.
方法:
提取patch,大概就是取图片的一个个小方块,方块之间有overlap,具体请参考论文。
2.2 回归正题
在介绍论文的模型前,先来看一下这篇论文。
《ImageNet Classification with Deep Convolutional Neural Networks》
模型的构造如上图,原图 224 * 224 * 3,经过96 个 11 * 11 * 3 的卷积核,再经过256 个 5 * 5 * 48 (这里是48而不是96的原因,应该是作者将它分在了两个GPU上训练)的卷积核 。。。具体请参考论文。
用上面提到的那篇文章的 network architecture,再做一些改进。
如下面这个图所示:
The network architecture of [27] assumes a fixed-size image patch of 224×224 RGB pixels as input and outputs a 1 × 1 × N vector of per-class scores as output, where N is the number of classes.
The aim is to apply the network to bigger images in a sliding window manner thus extending its output to n × m × N where n and m denote the number of sliding window positions in the x- and y- direction in the image, respectively, computing the N per-class scores at all input window positions.
所以,对于大小不是224 * 224的图片,论文的方法就是仿照卷积的方法,对更大的图像进行一个窗函数的操作,例如 2048 * 1024的图片,用224*224的窗,stride为32,就会得到58 * 26张图片,因此,结果就是58 * 26 * 20的矩阵,而不是一个20维的向量。
但是,我们的目的是对于一张图片,对每个类产生一个score,现在产生了 m * n * classes 的矩阵,文中采用了pooling。
This is achieved by aggregating the n × m × N matrix of output scores for n × m different positions of the input window using a global max-pooling operation into a single 1 × 1 × N vector, where N is the number of classes.
max-pooling的作用:
max-pooling operation effectively searches for the best-scoring candidate object position within the image
due to the max-pooling operation the output of the network becomes independent of the size of the input image, which will be used for multi-scale learning
2.3 Loss Function
loss function怎么选?对于论文中的问题,K=20 类,需要知道每幅图片是否包含某个物体。
对于每一个类别
Treating a multi-label classification problem as twenty independent classification problems is often inadequate because it does not model label correlations. This is not a problem here because the twenty classifiers share hidden layers and therefore are not independent. Such a network can model label correlations by tuning the overlap of the hidden state distribution given each label.
2.4 用于learning和分类
3. 代码
源代码可以在 http://www.di.ens.fr/willow/research/weakcnn/ 下载
为什么是 lua…
三十分钟入门中。。。
if function很像matlab,其他像python
接下来Torch入门
在下一篇博客中试着用tensorflow实现
http://blog.****.net/zhoujunr1/article/details/77119902