【信息技术】【1996.04】噪声环境下的语音识别

本文为美国卡耐基梅隆大学（作者：Pedro J. Moreno）的博士论文，共130页。

当语音识别系统在不利的声学环境中运行时，其准确度会严重降低。近年来，为了解决语音识别的鲁棒性问题，人们开发了许多方法，比如使用特征归一化算法、麦克风阵列、基于人类听觉的表示和其他方法等。然而，到目前为止，这些算法所能够提供的识别精度提高是有限的，部分原因是用于表征声学退化的数学模型不够充分。

本文首先利用蒙特卡罗仿真技术研究了语音识别系统在噪声中劣化的原因。通过对这些仿真的观察，我们提出了一个简单而有效的模型，说明环境如何影响用于表征语音识别系统及其输入的参数。将所提出的环境退化模型应用于两种不同的环境补偿方法：数据驱动法和基于模型的方法。数据驱动方法通过直接比较噪声环境中记录的语音与最佳条件下记录的相同语音，了解噪声环境如何影响传入语音的特性。基于模型的方法使用环境的数学模型，并尝试使用退化语音的样本来估计模型参数。

在本论文中，我们认为，一个详细的环境退化数学公式可以提高数据驱动和基于模型的补偿程序的识别精度。我们为数据驱动补偿开发的表示方法既可以应用于输入特征向量，也可以应用于语音识别系统使用的存储统计模型。这两种数据驱动补偿方法分别称为RATZ和STAR。最后，我们介绍了一种新的基于矢量泰勒级数的模型补偿方法，称为VTS算法。在一系列实验中评估了所提出的补偿算法，这些实验测量了由不同信噪比（SNR）人工注入的加性噪声破坏的ARPA华尔街日报数据库中的语音识别精度。对于任何特定的信噪比，实际补偿算法提供的识别精度上限是在该信噪比下用噪声数据训练的系统的识别精度。RATZ、VTS和STAR算法在全局信噪比分别低至15、10和5分贝时达到了这一界限。实验结果还表明，利用本文提出算法得到的识别误差率明显优于利用现有技术所能得到的识别误差率。还有少量的实验结果表明，我们的方法所提供的识别精度的提高也扩展到了自然环境中记录的退化语音。本文还通过向量泰勒级数给出了环境补偿问题的一般表达式及其求解方法，我们展示了如何将向量泰勒级数与极大似然公式结合使用以显著提高识别精度。

The accuracy of speech recognition systemsdegrades severely when the systems are operated in adverse acoustical environments.In recent years many approaches have been developed to address the problem ofrobust speech recognition, using feature-normalization algorithms, microphonearrays, representations based on human hearing, and other approaches.Nevertheless, to date the improvement in recognition accuracy afforded by suchalgorithms has been limited, in part because of inadequacies in themathematical models used to characterize the acoustical degradation. Thisthesis begins with a study of the reasons why speech recognition systemsdegrade in noise, using Monte Carlo simulationtechniques. From observations about these simulations we propose a simple andyet effective model of how the environment affects the parameters used tocharacterize speech recognition systems and their input. The proposed model ofenvironment degradation is applied to two different approaches to environmentalcompensation, data-driven methods and model-based methods. Data-driven methodslearn how a noisy environment affects the characteristics of incoming speechfrom direct comparisons of speech recorded in the noisy environment with thesame speech recorded under optimal conditions. Model-based methods use amathematical model of the environment and attempt to use samples of thedegraded speech to estimate the parameters of the model. In this thesis weargue that a careful mathematical formulation of environmental degradationimproves recognition accuracy for both data-driven and model-based compensationprocedures.The representation we develop for data-driven compensationapproaches can be applied both to incoming feature vectors and to the storedstatistical models used by speech recognition systems. These two approaches todata-driven compensation are referred to as RATZ and STAR, respectively.Finally, we introduce a new approach to model-based compensation with solutionbased on vector Taylorseries, referred to as the VTS algorithms. The proposed compensation algorithmsare evaluated in a series of experiments measuring recognition accuracy forspeech from the ARPA Wall Street Journal database that is corrupted by additivenoise that is artificially injected at various signal-to-noise ratios (SNRs).For any particular SNR, the upper bound on recognition accuracy provided bypractical compensation algorithms is the recognition accuracy of a systemtrained with noisy data at that SNR. The RATZ, VTS, and STAR algorithms achievethis bound at global SNRs as low as 15, 10, and 5 dB, respectively. Theexperimental results also demonstrate that the recognition error rate obtainedusing the algorithms proposed in this thesis is significantly better than whatcould be achieved using the previous state of the art. We include a smallnumber of experimental results that indicate that the improvements inrecognition accuracy provided by our approaches extend to degraded speechrecorded in natural environments as well. We also introduce a genericformulation of the environment compensation problem and its solution via vectorTaylor series.We show how the use of vector Taylorseries in combination with a Maximum Likelihood formulation produces dramaticimprovements in recognition accuracy.

1 引言
2 SPHINX-II语音识别系统
3 已有的关于环境补偿的研究工作
4 环境对无噪声语音分布的影响
5 基于数据的环境补偿算法的统一框架
6 RATZ系列算法
7 STAR系列算法
8 用于鲁棒语音识别的向量泰勒级数算法
9 小结与结论
附录A 数据补偿与分布式补偿方法的比较
附录B SNR-RATZ校正因子求解
附录C 基于SNR-RATZ的无噪语音分布式参数求解
附录D 基于EM的VTS算法n、q参数求解

今天的文章【信息技术】【1996.04】噪声环境下的语音识别分享到此就结束了，感谢您的阅读。

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。
如需转载请保留出处：https://bianchenghao.cn/62641.html

【信息技术】【1996.04】噪声环境下的语音识别

相关推荐

发表回复