【信息技术】【2010.12】基于听觉的鲁棒语音识别信号处理

【信息技术】【2010.12】基于听觉的鲁棒语音识别信号处理
本文为CHANWOO KIM的博士毕业论文,共208页。

尽管近几十年来自动语音识别系统有了很大的改进,但在噪声环境中,语音识别的准确度仍然会显著下降。虽然已经开发了许多算法来解决这一问题,但它们在静止噪声(如白色或粉色噪声)中的效果往往比在更真实的退化(如背景音乐、背景语音和混响)中的效果更好。同时,人们广泛观察到,人类听觉系统在与之相同的环境中仍保持着较好的性能。本论文的目标是利用人类听觉处理所激发的数学表示来提高自动语音识别系统的准确性。在我们的工作中,我们关注听觉处理的五个方面。我们首先注意到,表征中的非线性,特别是非线性阈值效应,在语音识别中起着重要的作用。我们工作的第二个方面是重新考虑时频分辨率的影响,基于观察的结果,即使用相对较长的观察窗口获得噪声属性的最佳估计,采用频率平滑为鲁棒识别提供了显著的改进。第三,我们注意到人类对信号成分的缓慢变化基本上是不敏感的,这些变化很可能是由输入的噪声成分引起的。我们还考虑了时域掩蔽效应,在混响环境中以及在单个干扰说话人存在下语音处理的优先效应。最后,我们利用人类双耳系统在提供输入信号的空间分析方面所提供的优异性能来开发使用两个麦克风的信号分离系统。在这项工作中,我们提出了许多信号处理算法,这些算法是由观测数据驱动的,可以通过实时在线处理以高计算效率的方式实现。我们证明了这些方法在各种类型的噪声和混响环境中对提高语音识别精度是有效的。

Although automatic speech recognitionsystems have dramatically improved in recent decades, speech recognitionaccuracy still significantly degrades in noisy environments. While manyalgorithms have been developed to deal with this problem, they tend to be moreeffective in stationary noise such as white or pink noise than in the presenceof more realistic degradations such as background music, background speech, andreverberation. At the same time, it is widely observed that the human auditorysystem retains relatively good performance in the same environments. The goalof this thesis is to use mathematical representations that are motivated byhuman auditory processing to improve the accuracy of automatic speechrecognition systems. In our work we focus on five aspects of auditoryprocessing. We first note that nonlinearities in the representation, andespecially the nonlinear threshold effect, appear to play an important role inspeech recognition. The second aspect of our work is a reconsideration of theimpact of time-frequency resolution based on the observations that the best estimatesof attributes of noise are obtained using relatively long observation windows,and that frequency smoothing provide significant improvements to robustrecognition. Third, we note that humans are largely insensitive to theslowly-varying changes in the signal components that are most likely to arisefrom noise components of the input. We also consider the effects of temporalmasking and the precedence effect for the processing of speech in reverberantenvironments and in the presence of a single interfering speaker. Finally, weexploit the excellent performance provided by the human binaural system inproviding spatial analysis of incoming signals to develop signal separationsystems using two microphones. Throughout this work we propose a number ofsignal processing algorithms that are motivated by these observations and canbe realized in a computationally efficient fashion using real-time onlineprocessing. We demonstrate that these approaches are effective in improvingspeech recognition accuracy in the presence of various types of noisy andreverberant environments.

  1. 引言
  2. 以前工作回顾
  3. 时频分辨率
  4. 听觉非线性
  5. 小功率BOOSTING算法
  6. 基于功率分配归一化的环境补偿
  7. 起效增强
  8. 幂归一化倒谱系数
  9. 基于两个麦克风的补偿
  10. 时空掩蔽的结合
  11. 总结与结论

更多精彩文章请关注公众号:【信息技术】【2010.12】基于听觉的鲁棒语音识别信号处理