Search In this Thesis
   Search In this Thesis  
العنوان
Information retrieval system for automatic categorization of wikipedia articles /
الناشر
Nesma Abd Elhakim Refaei Ali ,
المؤلف
Nesma Abdelhakim Refaei Ali
هيئة الاعداد
باحث / Nesma Abdelhakim Refaei Ali
مشرف / Elsayed E. Hemayed
مشرف / Riham Mansour
مناقش / Mohsen Rashwan
مناقش / Reda Abdelwahab
تاريخ النشر
2016
عدد الصفحات
79 P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
وسائل الاعلام وتكنولوجيا
تاريخ الإجازة
9/3/2016
مكان الإجازة
جامعة القاهرة - كلية الهندسة - Computer Engineering
الفهرس
Only 14 pages are availabe for public view

from 98

from 98

Abstract

Wikipedia has built a categorization system that assigns for each of its articles a set of categories to facilitate the navigation through the related pages. So far, the categorization process is done manually which makes it confusing, tiring and a time consuming task. In this thesis, we propose a system for automatically categorizing newly created Wikipedia articles. The proposed system uses an information retrieval approach to get relevant Wikipedia articles using the article’s body, headings, and hyperlinks with other Wikipedia articles. Then it ranks the set of categories associated with these relevant articles based on their relevancy scores. Besides, we use another important signal which is the co-occurrence between the candidate categories which helps in ranking the categories. Finally, the top k ranked categories are retrieved as topics for the input article. Our system achieved relative enhancements over basic search using text only by 17.7% in F-measure and 20.2% in Mean Total Reciprocal Rank. Also it increased the accuracy over a state of the art technique by at least 10.2% on its datasets. Finally, it’s evaluated on a benchmark dataset proposed by LSHTC competition and achieved gains over its K-NN baseline by 8.1% in accuracy