Deeplab v3+论文笔记
Deeplab v3+
Deeplab v3+的结构
1) Spatial pyramid pooling
- encode multi-scale contextual information by probing the incoming features.
2) encode-decoder structure
- capture shaper object boundaries by gradually recoving the spatial information.
3) depthwise separable conv
4) Modified xception based on aligned xception
Paper: Pyramid scene network
5) Channel number and network setting
4.1 Decoder Design Choices
4.2 ResNet-101 as Network Backbone
4.3 Xception as Network Backbone
Introduction
1) Deeplab v3
2) Encode-decoder
3) Deeplab v3+ =Deeplab v3 + encode-decoder
Atrous spatial pyramid pooling
Consider two-dimensional signals, for each location i on the output y and a filter w, atrous convolution is applied over the input feature map x:
where the atrous rate r corresponds to the stride with which we sample the input signal, which is equivalent to convolving the input x with upsampled filters produced by inserting r − 1 zeros between two consecutive filter values along each spatial dimension (hence the name atrous convolution where the French word trous means holes in English). Standard convolution is a special case for rate r = 1, and atrous convolution allows us to adaptively modify filter’s field-ofview by changing the rate value.