Author: Abdel-Hamid, Nahla Bishri Abdel-Momen./ Title: A proposed framework for imbalanced medical big data mining /

Search In this Thesis

العنوان

A proposed framework for imbalanced medical big data mining /

المؤلف

Abdel-Hamid, Nahla Bishri Abdel-Momen.

هيئة الاعداد

باحث / نهله بشرى عبدالمؤمن عبدالحميد

مشرف / هشام عرفات على

مشرف / علي إبراهيم الدسوقي

مشرف / سالي محمد الغمراوي

الموضوع

Internet. Databases as Topic. Medical Informatics - Methods. Data Mining - Methods.

تاريخ النشر

2019.

عدد الصفحات

online resource (111 pages) :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة (متفرقات)

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة المنصورة - كلية الهندسة - الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

129

from

129

Abstract

Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been introduced for solving the problem of class imbalance in big data, to enhance the generalization in classification. However, most of these approaches neglect the effect of borders’ samples on classification performance; the high impact borders samples might expose to misclassification. So, there are two main objectives of this thesis. The first is to propose a Spark Based Mining Framework (SBMF) that unleashes the computational power of Spark by utilizing the full features of the big dataset without the need to sacrifice part of the data as used in conventional sequential processing methods. The distributed computational advantage offered by Spark allows the full use of information hidden in all dataset features; consequently, contributes to the improvement of the classification performance. This Framework addresses the imbalanced data problem. It consists of five main layers and it has two main modules that have many novel algorithms and techniques designed to overcome the imbalance problem. The problem of class imbalance in big data is not the only challenge in data mining but also, the growing size of the data sets that need to be mined with its different number of its data sources (internet of things, sensors and clinical data, and mobiles and laptop terminals) and the variety of data formats simulate another big challenge. The energy system and intelligent preprocessing are important to high-speed application such as data analysis and classification purposes. In fact, big data mining methods and tools plays vital role in big data analysis solutions.In the second objective, A Spark-Based Whale Optimization algorithm (SB-WO) for Promoting the Classification of Imbalanced Big Data is proposed. It consists of Whale optimizer and Random forest classifier. The whale optimizer is designed to adapt the parameters setting of the Random Forest classifier. The Results obtained from the proposed framework when compared with other sampling techniques ensure the validity of the proposed framework.