Search In this Thesis
   Search In this Thesis  
العنوان
Deducing Decision Rules using the Categorization of the Numerical Features\
المؤلف
Elhilbawi,Hanan Mokhtar Abdelaziz Fahmy
هيئة الاعداد
باحث / حنان مختار عبد العزيز فهمي الهلباوي
مشرف / هانى محمد كمال مهدى
مشرف / سيف الدين محمد الدولتلى
مناقش / عبدالبديع محمد سالم
تاريخ النشر
2021.
عدد الصفحات
78p.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2021
مكان الإجازة
جامعة عين شمس - كلية الهندسة - كهرباء حاسبات
الفهرس
Only 14 pages are availabe for public view

from 100

from 100

Abstract

Data pre-processing represents one of the most crucial stages in data analysis. While developing novel machine learning techniques has been the main focus of research in the data science field, less attention has been given to data pre-processing. Discretizing continuous attributes is one essential and important data preprocessing step in data mining. There have been multiple efforts to propose discretization techniques with different characteristics. However, a clear pathway that can guide the choice of the needed discretization technique for different types of datasets is lacking.
In this thesis, a taxonomy of discretization techniques was proposed based on the existence of class information and relationship between attributes in the analyzed dataset. The importance of discretization as a pre-processing step is also examined to demonstrate how it assists in achieving better classification performance compared to using continuous attributes. The performance of multiple parametric and non-parametric discretization methods in conjunction with a number of machine learning classifiers were applied to the problem of predicting Intensive Care Unit (ICU) mortality. The results demonstrate the significance of discretizing the input attributes in this problem where using discretized data achieved classification accuracy and F1 score of 89.19% and 0.38, respectively, while using continuous attributes achieved a classification accuracy and F1 score of 86.19% and 0.08, respectively. These results demonstrate that discretizing continuous attributes prior to applying machine learning models could result in significant performance enhancement.