interspeech2020论文阅读

Streaming ASR

1.Scout Network

interspeech2020论文阅读
(1)SN
文中用SN检测word boundary(严格来说是label boundary)，模型采用N个self-attention层(最前面有CNN层做下采样)，因为第i帧特征对应的输出仅依赖于前面的输出(如何实现的，通过mask？？)，所以SN没有latency。SN输出层用一个linear层接sigmoid预测概率 p_i ,通过最小化CE-Loss训练SN：
interspeech2020论文阅读
(2)Recognition Network training

文中采用Triggered attention 作为streaming decoder，也可以用mocha。

此处有争议的地方在于使用一个offline的transformer初始化，如果不初始化，模型是否能收敛？或者说收敛速度如何？（我之前的实验结果表明，即便用一个offline的transformer初始化，模型在dev set上的loss也很不稳定）。
(3)Decoding
最重要的部分，具体解码算法请读原文。
(4)Experiment
有意思的地方：Scout Network Evaluation，采用预测的边界和参考边界之间的edit distance作为evaluation metric。
实验结果
interspeech2020论文阅读

2.Knowledge Distillation from Offline to Streaming RNN Transducer

train an offline RNN-T that can serve as a good teacher to train a student streaming RNN-T.

interspeech2020论文阅读

interspeech2020论文阅读

Streaming ASR

1.Scout Network

2.Knowledge Distillation from Offline to Streaming RNN Transducer

相关推荐