Search In this Thesis
   Search In this Thesis  
العنوان
Intelligent Analysis of Textual Content for Spam Detection \
المؤلف
Ibrahim,Mokhtar Ashour
هيئة الاعداد
باحث / مختار عاشور ابراهيم خضير
مشرف / محمد واثق على كامل الخراشى
مشرف / شريف رمزي سلامة
مناقش / محسن عبدالرزاق على رشوان
تاريخ النشر
2019
عدد الصفحات
83p.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2019
مكان الإجازة
جامعة عين شمس - كلية الهندسة - قسم الحاسبات والنظم
الفهرس
Only 14 pages are availabe for public view

from 120

from 120

Abstract

Twitter popularity made it an important and instantaneous source of news and trending events around the world. It has attracted the attention of spammers who post malicious content embedded in tweets and in their profile pages. Spammers use different and evolving techniques to evade traditional security mechanisms, and that creates the need to develop robust solutions that adapt with these techniques. In this thesis, we focus on exploring different natural language processing methods to detect spam from tweets textual content.
One of the models that we propose in this thesis is the character n-gram model, which has an advantage of being robust to spamming techniques that depend on word manipulations. Another set of models we explore, are the word embedding models built with popular word embedding techniques. Finally, we study the character embedding model, which is built using deep learning techniques.
Using publicly available datasets, we evaluate the performance of multiple machine learning classifiers with the proposed models. Our experiments show that the result of some of our character n-gram models is achieving an F-measure of nearly 80%, which is an enhancement over the approaches that use the classical word n-grams from tweet tokens. We also show that our technique can detect spam tweets with low latency which is crucial in a real-time environment like Twitter.