ResNet
Paper: Deep Residual Learning for Image Recognition
---
Q:Is learning better networks as easy as stacking more layers?
A:vanishing/exploding gradients, which is largely addressed by normalization initialization and batch normalization, which ensures forward propagated signals to have non-zero variances .
degradation problem
随着网络深度的增加,准确度达到饱和,然后迅速衰退。并不是由过拟合导致的,增加网络层数反而导致更高的错误率。
--->并不是所有的网络系统都近似地易于优化。
--->not caused by vanishing gradients
--->reason is to be studied
--->conjecture that the deep plain nets may have exponentially low convergence rates.
introduce a deep residual learning framework to address degradation problem
--->假设相较于优化原始无参照映射H(x),优化残差映射F(x)=H(x)-x更容易。
--->原始映射F(x)+x可以使用skip-connection(shortcut connection)来实现。
Shortcut connection
--->those skipping one or more layers.
Deep Residual Learning
--->The degradation problem suggests that solvers might have difficulty in approximating identity mappings by multiple nonlinear layers.
--->In real cases, our reformulation may help to precondition the problem.(预置条件)
--->perform a linear projection Ws by the shortcut connection to match the dimensions
--->the dotted shortcuts increase dimensions.
--->follow 2 design rules:
1)for the same feature map size, the layer has same filter numbers.
2)if feature map size is halved, the filter number is doubled to preserve the time complexity per layer.
--->adopt Batch Normalization right after each convolution and before activation.
Deeper Bottleneck Architecture
--->projection shortcuts are not essential for addressing the degradation problem.
--->identity shortcuts are important for not increasing the complexity of the bottleneck architecture.
--->designed for economical considerations.
--->1*1 convolution layer is responsible for reducing and increasing dimensions.
Exploring over 1000 layers
--->overfitting for the small database