Search In this Thesis
   Search In this Thesis  
العنوان
A novel orthographic query expansion methodology for searching Arabic OCR-degraded text /
الناشر
Mostafa Ezzat Abdulaty Kamel ,
المؤلف
Mostafa Ezzat Abdulaty Kamel
تاريخ النشر
2015
عدد الصفحات
124 Leaves :
الفهرس
يوجد فقط 14 صفحة متاحة للعرض العام

from 155

from 155

المستخلص

The OCR-degraded text retrieval major challenge is the distortion of the OCR- ed characters due to the recognition errors. The distortion is always unpredictable, and the users whom try to search the OCR-Degraded text are unaware of such distortion. So the query they used to find a specific document hardly matches the OCR-ed text, especially in low quality images. To increase the effectiveness of OCR-Degraded text retrieval, a new model based on three stages, the first stage is a word based OCR training model, the model was trained on about 200,000 words to expect the will be generated from the OCR application of the word after recognition. The second stage 2Character based model3, which works only on the words not found in the training dataset, the model takes the clean word from the user search query and generates the corresponding degraded word based on the character and preceding and succeeding characters in the same word. The final stage is generating the degraded search query based on the training dataset, and this stage takes the clean user query and produces the degraded search query based on the word based and character based models.