Context Encoding for Semantic Segmentation[CVPR2018]

更像是17年encoding network的一个应用，实验部分比较好
18,17

1.FCN framework

global receptive fields: conv(no linearities) + downsample

spatial resolution loss

encoder: dilated conv
- pro: expand receptive field
- con: isolate pixels from context scene, misclassified
decoder: upsample, deeplabv3+

multiple scale object

multi-resolution pyramid-based representation: SPP module
Q: Is capturing contextual information the same as increasing the receptive field size?

2.Architecture

Context Encoding for Semantic Segmentation[CVPR2018]

Featuremap Attention

本文的核心贡献,　dense feature map经过一个encoding layer得到context embedding,然后通过FC得到一个classwise的score,作为权重(一种独特的Attention)

Semantic Encoding loss

实际上就是multi-label classification loss,分割网络加入一支classification loss可以提高结果
eg: Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation[IJCAI2017]

Encoding Layer

本文的基石,

对比

Context Encoding for Semantic Segmentation[CVPR2018]
方法和传统方法的对比，以前使用bag of words或fisher vector, Dictionary一般通过聚类/GMM得到

步骤

Context Encoding for Semantic Segmentation[CVPR2018]

$r_{i k} = x_{i} - c_{k}$

$e_{k}$ 是针对第k个code word的输出结果, ｓ是平滑系数(learnable), Encoder最终输出定长表示 $E = {e_{1}, \dots, e_{k}}$ ，与code word数K有关，与输入的特征数N无关

每个codeword encode之后的embedding,是残差(区别只是所有pixel-wise Feature的soft weight,还是直接选最近的一个)

HxWxC => N x C -> k x C -(fc)-> 1 x C
reshape encoding
直觉解释, K种visual，每种都由C-dim channel不同程度贡献而成,所以已知K种visual的表现型，可以反过来得到每个channel的一个attention

3. excerpt

For a given in-
put image, hand-engineered features are densely extracted
using SIFT [38] or filter bank responses [30, 48]. Then a vi-
sual vocabulary (dictionary) is often learned and the global
feature statistics are described by classic encoders such as
Bag-of-Words (BoW) [8, 13, 26, 46], VLAD [25] or Fisher
Vector [44]. The classic representations encode global con-
textual information by capturing feature statistics

Context Encoding for Semantic Segmentation[CVPR2018]

1.FCN framework

spatial resolution loss

multiple scale object

2.Architecture

Featuremap Attention

Semantic Encoding loss

Encoding Layer

对比

步骤

3. excerpt

相关推荐