ResNet

Paper: Deep Residual Learning for Image Recognition

---

Q:Is learning better networks as easy as stacking more layers?

A:vanishing/exploding gradients, which is largely addressed by normalization initialization and batch normalization, which ensures forward propagated signals to have non-zero variances .


degradation problem

随着网络深度的增加,准确度达到饱和,然后迅速衰退。并不是由过拟合导致的,增加网络层数反而导致更高的错误率。

--->并不是所有的网络系统都近似地易于优化。

--->not caused by vanishing gradients

--->reason is to be studied

--->conjecture that the deep plain nets may have exponentially low convergence rates.


introduce a deep residual learning framework to address degradation problem

--->假设相较于优化原始无参照映射H(x),优化残差映射F(x)=H(x)-x更容易。

--->原始映射F(x)+x可以使用skip-connection(shortcut connection)来实现。

ResNet


Shortcut connection

--->those skipping one or more layers.


Deep Residual Learning 

--->The degradation problem suggests that solvers might have difficulty in approximating identity mappings by multiple nonlinear layers.

--->In real cases, our reformulation may help to precondition the problem.(预置条件)

--->perform a linear projection Ws by the shortcut connection to match the dimensions

ResNet

--->the dotted shortcuts increase dimensions.

--->follow 2 design rules:

1)for the same feature map size, the layer has same filter numbers.

2)if feature map size is halved, the filter number is doubled to preserve the time complexity per layer.

ResNet

--->adopt Batch Normalization right after each convolution and before activation.


Deeper Bottleneck Architecture

--->projection shortcuts are not essential for addressing the degradation problem.

--->identity shortcuts are important for not increasing the complexity of the bottleneck architecture.

--->designed for economical considerations.

--->1*1 convolution layer is responsible for reducing and increasing dimensions.


Exploring over 1000 layers

--->overfitting for the small database