论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

主要是基于 Faster R-CNN. 根据文字检测场景的特点(水平排列)，改进 RPN 网络、Anchors 和 Loss 函数. 关键点：

文本细尺度 proposals的检测(detecting text in fine-scale proposals)；

递归地连接文本 proposals(recurrent connectionist text proposals)；

侧边改进(side-refinement).

CTPN(Connectionist Text Proposal Network)，end-to-end 框架：

类似于 Faster R-CNN，对 VGG16 网络输出(conv5)的卷积 feature map， $3 \times 3$ 的窗口平滑(anchors 固定尺寸)；
采用双向LSTM(Bi-directional LSTM, BLSTM)对每一行的序列窗口递归(recurrently)连接；其中，每个窗口的卷积特征( $3 \times 3 \times C$ ) 作为 256D BLSTM 的输入(双向的，两个128D LSTMs).
LSTM RNN层后接一个 512 FC 层，联合输出文本/非文本概率，y-轴坐标及 k 个 anchors的 side-refinement 偏移值.

文本细尺度 proposals的检测

对比 Faster R-CNN 中的 RPN(Region Proposal Netwoork)：
论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

左：RPN proposals；右：细尺度的文本 proposals.

相同点：

均需要采用全卷积网络，允许任意尺寸的输入图片；
不同点：

CTPN 通过对卷积特征图根据小窗口平滑，检测文本行；输出细尺度(如，固定宽度为 16-pixel)的文本 proposals 序列，如图，每个 box 颜色表示文本/非文本 score. 只给出了 positive scores 的boxes.

CTPN，垂直 anchor 机制，同时预测文本/非文本 score 和每个细尺度 proposal 的 y-轴位置：

文本 proposal 的宽度固定设定为 16 pixels(对应输入图片尺寸)；
k 个垂直 anchors 用于预测每个 proposal 的 y-轴坐标. k 个 anchors 的水平位置相同，都是固定宽度的 16 pixels，但其垂直位置在 k 个不同高度.

论文采用，k=10，每个 proposal 设定 10 个 anchors，anchors 高度范围为 [11 - 273] pixels(每次除以 0.7).

仅回归y1,y2，而不是x1, x2, y1, y2.

递归连接文本 Proposals

文本具有较强序列性特点，以表达序列化的上下文信息.

采用双向 long short-term memory (LSTM) 结构作为 RNN layer，其作用效果：
论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

上：CTPN without RNN；

下：CTPN with RNN.

侧边改进 Side-refinement

在获得细尺度文本 proposals 后，根据文本/非文本 score 是否大于 0.7，将连续的文本 proposals 连接，以构建文本行.

在细尺度文本 proposals 检测和 RNN 连接后，可以得到垂直方向的精确位置.

水平方向上，图片被分为等宽 16 pixel 的 proposals 序列. 可能导致文本 proposals 的水平方向侧边不够精确，如图 Figure 4.
论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

侧边改进，大概能提高 2% 精度.

模型输出和 Loss 函数

End-to-end 网络，同时预测三个输出：

文本/非文本 scores $s$ - 2k 个参数
垂直坐标 $y = {v_{c}, v_{b}}$ - 2k
侧边改进偏移值 $o$ - k

Multi-task Loss 函数：
论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

Results

论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

Releated

[1] - CSDN 博客 - CTPN: Detecting Text in Natural Image with Connectionist Text Proposal Network

[2] - CSDN 博客 - [论文复现]Detecting Text in Natural Image with Connectionist Text Proposal Network

论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

论文阅读学习 - CTPN-Detecting Text in Natural Image with Connectionist Text Proposal Network

文本细尺度 proposals的检测

递归连接文本 Proposals

侧边改进 Side-refinement

模型输出和 Loss 函数

Results

Releated

相关推荐