【信息技术】基于基音的鲁棒语音识别技术

【信息技术】基于基音的鲁棒语音识别技术

本文为西班牙格拉纳达大学(作者:Juan Andres Morales Cordovilla)的毕业论文,共53页。

本文提出并研究了在噪声环境下利用基音(可以理解为语音的基频)来实现鲁棒自动语音识别(ASR)的各种技术。本文研究的不是基音提取本身,而是利用基音进行鲁棒语音识别的最佳方法。我们研究了相关领域的文献和技术现状,然后,提出了三种基于基音的技术,并与其他类似的技术进行比较。我们的三种技术建议是:将非对称窗应用于噪声信号的自相关(试图提供对噪声不太敏感的频谱)、干净准周期信号自相关的两个估计器(称为平均和筛选估计器)和一个能处理非平稳噪声的噪声估计器技术,该技术利用基音信息估计边缘化MD(Missing Data)识别器所需的可靠性掩码。此外,我们将讨论基于基音的鲁棒ASR技术的性能限制,该技术采用了最小化噪声的假设。为了做到这一点,我们将这些技术用于识别语音帧的基本鲁棒机制,进一步获取最佳实现机制(通过一些等价的方法),并通过应用MD oracle掩码和理想基音实验获得相应的极限结果。我们的一个结论是,用于MD识别的噪声估计技术接近于基于基音的鲁棒ASR技术的极限,尽管它需要额外的信息来实现MD Oracle掩码的性能。最后,我们将从本文提出的观点出发,对未来研究的一些可能性(其中一些与无基音的语音有关)进行评述。

This Thesis proposes and carries out astudy of different techniques which, in some way, use the pitch (which will beunderstood as the fundamental frequency of speech) in order to carry out robustASR (Automatic Speech Recognition) under noise conditions. The Thesis is notconcerned with pitch extraction itself, but with the best way of using pitchfor robust speech recognition. We will also carry out a study of the relatedbibliography and the state of art regarding these pitch-based techniques forrobust ASR. Then, we will propose three pitch-based techniques which will becompared to other similar ones. Our three proposals are: application ofasymmetric windows to the noisy signal autocorrelation which tries to provide aspectrum less sensitive to noise, two estimators, named as averaging andsifting estimators, of the autocorrelation of the clean quasi-periodic signal,and a noise estimation technique which can deal with non stationary noise byemploying pitch information and which is used to estimate the reliability masksrequired by a marginalization MD (Missing Data) recognizer. Additionally, wewill discuss the performance limits of the pitch-based techniques for robustASR which employ minimal assumptions about the noise. In order to do so, wewill identify the basic robust mechanisms employed by these techniques forrecognizing voiced frames, the optimum mechanisms will be identified (by meansof some equivalences), and the corresponding limit results will beexperimentally obtained by applying MD oracle masks and ideal pitch. One of ourconclusions is that our noise estimation technique for MD recognition is closeto the limits of the pitch-based techniques for robust ASR, although it wouldrequire additional information in order to achieve the performance with MDoracle masks. Finally, we will comment some possibilities (some of them relatedto speech without pitch) for future research from the ideas developed in thisThesis.

  1. 引言
  2. 自动语音识别原理
  3. 传统的与基于基音的鲁棒技术
  4. 本文提出的技术
  5. 基音技术的等效与局限性

更多精彩文章请关注公众号:【信息技术】基于基音的鲁棒语音识别技术