Author: El- Boraay, Amira Shafik El –Said./ Title: Digital Speech Recognition \

Search In this Thesis

العنوان

Digital Speech Recognition \

المؤلف

El- Boraay, Amira Shafik El –Said.

الموضوع

Automatic speech recognition. Signal processing.

تاريخ النشر

2010 .

عدد الصفحات

152 p. :

الفهرس

Only 14 pages are availabe for public view

from

181

from

181

Abstract

Speech signals not only convey the literal message being spoken but also covey information about the identity of the speaker. Speaker identification relies on the development of features, which accurately capture speaker discriminatory information. For feature extraction, Linear Predictive Coding (LPC) and Mel- Frequency Cepstral Coefficients (MFCCs) are conventional methods. LPC is based upon basic principles of sound production. These conventional methods have been proven good performance for feature extraction in speaker identification, but they have the drawback of performance degradation in the presence of noise or channel impairments.
To improve the performance of automatic speaker identification systems, this thesis proposes some modifications in the application of the MFCCs method. The discrete transforms such as the Discrete Cosine Transform (DCT), the Discrete Sine Transform (DST), and the Discrete Wavelet Transform (DWT) are investigated as the choices of proper transform for robust feature extraction. The effect of speech enhancement techniques such as the spectral subtraction, the Wiener filtering, the adaptive Wiener filtering, and the wavelet denoising on speaker identification is studied.
The problem of speaker identification from telephone-like degraded speech is also studied with deconvolution as a preprocessing step. Three approaches for speech deconvolution are presented in this thesis and their effect on speaker identification with the MFCCs method is studied. These approaches are the Linear Minimum Mean Square Error (LMMSE) deconvolution technique, the inverse filter deconvolution technique and the regularized deconvolution technique.
The objective of speaker identification systems is to achieve security. To increase the level of security, audio watermarking can be used. In this thesis, one of the audio watermarking methods that embeds encrypted images into audio signals is used, and the effect of watermark embedding on the performance of the speaker identification system in the presence of degradations is studied.