《Siamese Neural Networks for One-shot Image Recognition》笔记

1 Motivation

  • Machine learning often break down when forced to make predictions about data for which little supervised information is available.

  • One-shot learning: we may only observe a single example of each possible class before making a prediction about a test instance.

2 Innovation

This paper uses siamese neural networks to deal with the problem of one-shot learning.

3 Adavantages

  • Once a siamese neural network has been tuned, we can then capitalize on powerful discriminative features to generalize the predictive power of the network not just to new data, but to entirely new classes from unknown distributions.
  • Using a convolutional architecture, we are able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.

4 Related Work

  • Li Fei-Fei et al. developed a variational Bayesian framework for one-shot image classification. (变贝叶斯框架)
  • Lake et al. addressed one-shot learning for character recognition with a method called Hierarchical Bayesian Program Learning(HBPL). (分层贝叶斯程序学习)

5 Model

Siamese Neural Network with L fully-connected layers

《Siamese Neural Networks for One-shot Image Recognition》笔记

This paper tries 2-layer, 3-layer or 4-layer network.

h1,l: the hidden vector in layer l for the first twin.
h1,2: the hidden vector in layer l for the second twin.
for the first L-1 layers:

h1,l=max(0,Wl1,lTh1,(l1)+bl)h2,l=max(0,Wl1,lTh2,(l1)+bl)

for the last layer:

p=σ(jaj|h1,l(j)h2,l(j)|)

where σ is the sigmoidal activation function.

Siamese Neural Network with CNN

《Siamese Neural Networks for One-shot Image Recognition》笔记

first twin: (conv ReLU max-pooling)*3 conv FC sigmoid
second twin: (conv ReLU max-pooling)*3 conv FC sigmoid

h1,l(k)=max-pool(max(0,Wl1,l(k)h1,(l1)+bl),2)h2,l(k)=max-pool(max(0,Wl1,l(k)h2,(l1)+bl),2)

where k is the k-th filter map, is the convolutional operation.

for the last fully connected layer:

p=σ(jaj|h1,l(j)h2,l(j)|)

6 Learning

Loss function

M: minibatch size
y(x1(i),x2(i)): the labels for the minibatch, if x1 and x2 are from the same classs, y(x1(i),x2(i))=1, otherwise y(x1(i),x2(i))=0
loss function: regularized cross-entropy

L(x1(i),x2(i))=y(x1(i),x2(i))logp(x1(i),x2(i))+(1y(x1(i),x2(i)))log(1p(x1(i),x2(i)))+λT|w|2

Optimizaiton

ηj: learning rate for j layer
μj: momentum for j layer
λj: L2 regularization weights for j layer

update rule at epoch T is as follows:

wkjT(x1(i),x2(i))=wkj(T)+ΔwT(x1(i),x2(i))+2λj|w|Δwkj(T)(x1(i),x2(i))=ηjwkj(T)+μjΔwkj(T)

where wkj(T) is the partial derivative with respect to the weight between the j-th neuron in some layer and the k-th neuron in the successive layer.

Weight initialization

Siamese Neural Network with L fully-connected layers

W of fully-connected layers: normal distribution, zero-mean, standard deviation 1fanin (fan-in = nl1)
b of fully-connected layers: normal distribution, mean 0.5, standard deviation 0.01

Siamese Neural Network with CNN

W of fully-connected layers: normal distribution, zero-mean, standard deviation 0.2
b of fully-connected layers: normal distribution, mean 0.5, standard deviation 0.01
w of convolution layers: normal distribution, zero-mean, standard deviation 0.01
b of convolution layers: normal distribution, mean 0.5, standard deviation 0.01

Learning schedule

Learning reates are decayed by ηj(T1)=0.99ηj(T1).
Momentum starts at 0.5 in every layer, increasing linearly each epoch until reaching the value μj.

This paper trained siamese neural network with L fully-connected layer for 300 epochs, and siamese neural network with CNN for 200 epochs.

This paper monitored one-shot validatioin error on a set of 320 one-shot learning tasks. When the validation error did not decrease for 20 epochs, This paper stopped and used the parameters of the model at the best epoch according to the one-shot validation error.

Omniglot dataset

The Omniglot data set contains examples from 50 alphabets, and from about 15 to upwards of 40 characters in each alphabet. All characters across these alphabets are produced a single time by each of 20 drawers.

《Siamese Neural Networks for One-shot Image Recognition》笔记
《Siamese Neural Networks for One-shot Image Recognition》笔记

Affine distortions

This paper augmented the training set with small affine distortions. For each image pair x1,x2, the paper generate a pair of affine transformations T1,T2 to yield x1=T1(x1), x2=T2(x2), and T=(θ,ρx,ρy,sx,sy,tx,ty).

A sample of random affine distortions generated for a single character in the Omniglot data set.
《Siamese Neural Networks for One-shot Image Recognition》笔记

Train

The size of mini-batch is 32. The samples of mini-batch is:

《Siamese Neural Networks for One-shot Image Recognition》笔记

7 Experiment

Test

The samples of test, N-way, the follow image shows 20-way.
《Siamese Neural Networks for One-shot Image Recognition》笔记

Results

Siamese Neural Network with L fully-connected layers

《Siamese Neural Networks for One-shot Image Recognition》笔记

Siamese Neural Network with CNN

《Siamese Neural Networks for One-shot Image Recognition》笔记

One-shot Image Recognition

Example of the model’s top-5 classification performance on 1-versus-20 one-shot classification task.
《Siamese Neural Networks for One-shot Image Recognition》笔记

One-shot accuracy on evaluation set:
《Siamese Neural Networks for One-shot Image Recognition》笔记

Comparing best one-shot accuracy from each type of network against baselines:
《Siamese Neural Networks for One-shot Image Recognition》笔记

参考:https://blog.csdn.net/bryant_meng/article/details/80087079
code address: https://github.com/sorenbouma/keras-oneshot