الفهرس | Only 14 pages are availabe for public view |
Abstract Automatic Speech Recognition (ASR) has proven to be a useful tool in many applications in our daily life. These applications could be interactive manmachine interface, aides for disabled, automatic assistance by telephone, and Multimedia Information Retrieval (MMIR) systems. Current state oftheart of Automatic Speech Recognition system (ASR) achieve a lot of success for English and other language like French, Dutch, Italy etc. However, for Arabic language few research work exists in this field. The present work aims to develop an automatic system for recognizing Arabic spoken language that can be used in multimedia information retrieval environment for the Arabic language as a first step to convert the spoken Arabic words to text. As a first step, an Arabic speech database has been developed. A number of quality control rules and precautions were followed for collecting the Arabic speech database. The Arabic speech database was obtained through recording speech material from Arabic broadcast and TV news spoken by different heralds from different Arab countries. The recorded speech materials were segmented manually into frames of 24ms length. Each frame contains one Arabic phoneme. The application of appropriate signal analysis and pattern recognition techniques has enabled important features of the records to be recognized and clarified. Two different approaches were utilized in feature extraction. First, to analyze the speech segment as one frame, and second to analyze the speech segment as three subframes with 50% overlapping and extract speech feature from each sub frame. Three techniques for feature extraction have been implemented: linear predictive coding coefficients, the cepstrum coefficients calculated using the discrete Fourier transform, and using the wavelet transform For the recognizer, a 3, 5, and 7 states left to right Hidden Markov Model (HMM) approach, and a neural network, classifier (feed forward and recurrent) were utilized. In an attempt to improve the recognition accuracy, a data fusion approach was used. This improves greatly the recognition accuracy. Moreover, a multistage feed forward neural network was utilized as a recognizer. It is concluded that the result of the multistage feed forward neural networks are promising. It gives the highest recognition accuracy which reaches 83.3%. |