Densely Connected Convolutional Networks (DenseNet 2017 CVPR)

Addition Link

Introduction

  1. ResNets [11] and Highway Networks [33] by-pass signal from one layer to the next via identity connec-tions.
  2. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.
  3. FractalNets [17] repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network.

Densely Connected Convolutional Networks (DenseNet 2017 CVPR)

ResNet and Highway Networks

Densely Connected Convolutional Networks (DenseNet 2017 CVPR)

Stochastic Depth

Recent variations of ResNets [13] show that many layers contribute very little and can in fact be randomly dropped during training.

Densely Connected Convolutional Networks (DenseNet 2017 CVPR)

FractalNets

Densely Connected Convolutional Networks (DenseNet 2017 CVPR)

AdaNet

Densely Connected Convolutional Networks (DenseNet 2017 CVPR)
Parallel to our work, [1] derived a purely theoretical framework for networks with cross-layer connections similar to ours.

Change

  1. we connect all layers (with matching feature-map sizes)
  2. in contrast to ResNets, we never combine features through summation before they are passed into a layer; in- stead, we combine features by concatenating them.
  3. narrow layers (eg: 12 filters per layer)

Advantage

  1. Similar to recurrent neural networks[21], it requires fewer parameters than traditional convolutional networks, as there is no need to re-learn redundant feature-maps.
  2. Implicit deep supervision[20], Improved flow of information and gradients, which make network easy to train. Further, dense connections have a regularizing effect.

Result

  1. Our models tend to require much fewer parameters than existing algorithms with comparable accuracy.
  2. Further, we significantly outperform the current state-of- the-art results on most of the benchmark tasks.

Reference

[1] C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang. Adanet: Adaptive structural learning of artificial neural networks. arXiv preprint arXiv:1607.01097, 2016. 2[1] C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang. Adanet: Adaptive structural learning of artificial neural networks. arXiv preprint arXiv:1607.01097, 2016. 2
[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 2, 3, 4, 5, 6
[12] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016. 2, 3, 5, 7
[13] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. Deep networks with stochastic depth. In ECCV, 2016. 1, 2, 5, 8
[17] G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016. 1, 3, 5, 6
[20] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeply- supervised nets. In AISTATS, 2015. 2, 3, 5, 7
[21] Q. Liao and T. Poggio. Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv preprint arXiv:1604.03640, 2016. 2
[33] R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. In NIPS, 2015. 1, 2, 5