Author: El-Hasnony, Ibrahim Mohamed El-Sayed./ Title: Data mining in medical applications /

Search In this Thesis

العنوان

Data mining in medical applications /

المؤلف

El-Hasnony, Ibrahim Mohamed El-Sayed.

هيئة الاعداد

باحث / إبراھيم محمد السيد الحسنونى

مشرف / أحمد أبوالفتوح صالح

مشرف / حازم مختار البكرى

مناقش / علاءالدين محمد رياض

مناقش / محمود محمد عبداللطيف

الموضوع

Data mining. Knowledge discovery. Data classification. Feature selection.

تاريخ النشر

2015.

عدد الصفحات

112 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

1/1/2015

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - Information System

الفهرس

Only 14 pages are availabe for public view

from

130

from

130

Abstract

There are massive amounts of data generated every day. Medical field, in which patient related data, past diagnosis and their treatment cost are recorded, is one of the most challenging examples of massive data. There are many problems found in medical data sets such as complexity, noise, data intensive, incompleteness and redundancy. The automatic prediction and detection of diseases like breast cancer is an imperative, challenging problem in medical applications. The automation process leads to improvements in decision making and patients’ treatments. For efficient and effective analysis of medical data, intelligent systems must be developed. Data mining and data preprocessing are two important stages of knowledge discovery. Data mining (the analysis step of knowledge discovery) provides many techniques that can be efficiently utilized for medical data analysis.
In this thesis, three studies are introduced. The first one includes a comparative study among five feature reduction algorithms. This study includes the effect of principal component analysis (PCA), rough set attribute reduction (RSAR), gain ratio, correlation feature selection (CFS), and fuzzy rough feature selection (FRFS) on improving classification accuracy. It has proved that FRFS and CFS outperforms the other techniques.
The second, an effective hybrid preprocessing stage is developed. Such stage combines K-means clustering algorithm with (CFS or FRFS) and produces a new data set from merging the reduced clustered data. This data is input into different classification algorithms to test the effect of the new preprocessing step on the accuracy. According to the results, the clustering before reduction in the preprocessing phase enhanced the classification accuracies compared to the original or just reduced data set.
Finally, an intelligent classifier for breast cancer data is proposed. The model makes use of the results of the previous study. K-means clustering algorithms with FRFS for feature reduction are used in the preprocessing to produce a new data set for each cluster. The data sets are merged forming the final reduced data set. Discernibility k-nearest neighbor (D-KNN) classification algorithm is used to build the classification model for the final merged data set. The results achieved accuracy up to 98.9% for breast cancer data. Moreover, the model is compared to previous studies on the same data set and approved better results. The details and limitations of the new approach are discussed and the future works are suggested.