Search In this Thesis
   Search In this Thesis  
العنوان
Building Algorithms and Databases for Arabic Linguistic Analysis /
المؤلف
Abd Rabo, Hitham Mohamed Abo Bakr.
الموضوع
Algorithms.
تاريخ النشر
2009.
عدد الصفحات
194 p. :
الفهرس
Only 14 pages are availabe for public view

from 217

from 217

Abstract

The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purpose in order to detect diacritics. One of the main contributions in this dissertation is that we distinguish between internal and case-ending diacritization since the former requires morphological analysis while the later depends on the syntactic analysis. We have successfully solved the Arabic internal diacritization problem using three different techniques, each of which has its own strengths and weaknesses. We combined them to optimize the performance of our diacritizer and to a large extent remove ambiguities. Case-ending diacritization is treated as a post process of the internal diacritization process. We have built a novel statistical approach based on Support Vector Machine (SVM) learning algorithm for detecting case-ending diacritic signs by including combination of morphological and syntactic features. The final result is a fully diacritized Arabic statement.