【DL】CNN的感受野 Receptive Field

Backto Deep Learning Index

本文源自 A Guide to Receptive Field Arithmetic for Convolutional Neural Networks

The receptive field is defined as the region in the input space that a particular CNN’s feature is looking at (i.e. be affected by). 一个卷积格子的感受野是其看到的在最初始输入层的区域。

【DL】CNN的感受野 Receptive Field

For the upper left sub-figure, the input image is a 5 x 5 matrix (blue grid). Then zero-padding with size of p = 1 (transparent grid around the input image) is used to maintain the edge information during convolution. After that, a 3 x 3 kernel with stride of s = 2 is used to convolve this image to gain its feature map (green grid) with size of 3 x 3. In this example, nine features are obtained and each feature has a receptive field with size of 3 x 3 (the area inside light blue lines). We can use the same convolution on this green grid to gain a deeper feature map (orange grid) as shown in sub-figure at the left bottom. As for orange feature map, each feature has a 7 x 7 receptive field.
The method above is a common way to visualize a CNN feature map. But if we only look at the feature map (green or orange grid), we cannot directly know which pixels a feature is looking at and how big is that region. The two sub-figures in the right column present another way to visualize the feature map, where the size of each feature map is fixed and equals to the size of input, and each feature is located at the center of its receptive field. In this situation, the only task is to calculate the area of the receptive field mathematically.

感受野计算公式:
【DL】CNN的感受野 Receptive Field

The first equation is defined by the same way as the first part of this article.
The second equation calculates the jump (j) in the output feature map. The jump is the distance between two adjacent feature maps. For an original input image, jump equals 1.
The third equation calculates the size of receptive field ( r ) of one output feature.
The fourth equation calculates the center position of the receptive field of the first output feature. Here, start is the center coordinate of one pixel.

【DL】CNN的感受野 Receptive Field