Deeplab v3+论文笔记

Deeplab v3+

Deeplab v3+的结构

Deeplab v3+论文笔记

1) Spatial pyramid pooling

  • encode multi-scale contextual information by probing the incoming features.

2) encode-decoder structure

  • capture shaper object boundaries by gradually recoving the spatial information.

3) depthwise separable conv
Deeplab v3+论文笔记

4) Modified xception based on aligned xception
Deeplab v3+论文笔记

Paper: Pyramid scene network

5) Channel number and network setting

4.1 Decoder Design Choices
4.2 ResNet-101 as Network Backbone
4.3 Xception as Network Backbone

Introduction

1) Deeplab v3
Deeplab v3+论文笔记
2) Encode-decoder
Deeplab v3+论文笔记
3) Deeplab v3+ =Deeplab v3 + encode-decoder
Deeplab v3+论文笔记

Atrous spatial pyramid pooling

Deeplab v3+论文笔记
Consider two-dimensional signals, for each location i on the output y and a filter w, atrous convolution is applied over the input feature map x:
Deeplab v3+论文笔记
where the atrous rate r corresponds to the stride with which we sample the input signal, which is equivalent to convolving the input x with upsampled filters produced by inserting r − 1 zeros between two consecutive filter values along each spatial dimension (hence the name atrous convolution where the French word trous means holes in English). Standard convolution is a special case for rate r = 1, and atrous convolution allows us to adaptively modify filter’s field-ofview by changing the rate value.