Author: Ahmed, Hager Saleh Mohammed./ Title: Efficient Predictive Analytics using Big Data Platforms /

Search In this Thesis

العنوان

Efficient Predictive Analytics using Big Data Platforms /

المؤلف

Ahmed, Hager Saleh Mohammed.

هيئة الاعداد

باحث / هاجر صالح محمد احمد

مشرف / عبدالمجيد أمين علي

مشرف / ايمان ممدوح يونس

الموضوع

Computational intelligence. Big data. Artificial intelligence. Database management. Computer communication systems. Computer science.

تاريخ النشر

2021.

عدد الصفحات

141 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Information Systems

تاريخ الإجازة

27/5/2021

مكان الإجازة

جامعة المنيا - كلية الحاسبات والمعلومات - نظم معلومات

الفهرس

Only 14 pages are availabe for public view

from

166

from

166

Abstract

Predictive data analytics in healthcare becomes a promising research direction due to the popularity of real-time monitoring and tracking systems. Healthcare systems are a typical example of tracking systems that provide high medical care services for people. An enormous amount of healthcare streaming data has been generated continuously from different resources such as Twitter and sensors. Because of the faster rate at which streaming data is produced, it is challenging to consume, process, and interpret such large amounts of data in real-time to take real-time measures in the event of an emergency. Consequently, this thesis aims to build an efficient real-time prediction system that can handle healthcare-based streaming data and indicate patient health’s current situation. This has been done by introducing four contributions towards a healthcare-based prediction system. Firstly, in this work, the first real-time proposed system of predicting heart disease from medical streaming tweets was developed. It consists of three components: Building Offline Model, Processing Pipeline, and Online Prediction. The main goal of the first component is to apply different types of ML algorithms on the historical heart disease data to find the best-developed model that has the highest accuracy. In the second component, the proposed system reads tweets and pushes them into Kafka’s topic. Then, Spark Streaming API reads tweets as streaming from Kafka’s topic, extracts health attributes, and sends them into the best-developed model to predict heart disease in real-time. The results of the first contribution proved that the RF has achieved the highest accuracy at 94.9%. The second real-time system of predicting Systolic Blood Pressure (SBP) in real-time was proposed. It consists of two components: developing an offline model and an online prediction pipeline. The aim of first component is to apply different DL models on the historical SBP time-series dataset to find the best-developed model. In the second component, the simulated sensor generates time-series SBP and pushes them into Kafka’s topic. Spark Streaming API reads five minutes of SBP from Kafka’s topic; after that, it sends them into the best-developed model to predict the near future of SBP in real-time. The results of the second contribution proved that the BI-LSTM using the three hidden layers model had achieved the smallest RMSE at 2.84. The third real-time proposed system of predicting sentiment about coronavirus from Twitter streaming data was developed. It consists of two components: developing an offline sentiment analysis and modeling an online prediction pipeline. The aim of first component is to apply ML algorithms to the dataset’s historical tweets to find the best-developed model. In the second component, Spark Streaming API reads tweets from Kafka’s topic and processes them; after that, it sends the processed tweets into the best-developed model to predict their sentiment about the coronavirus pandemic in real-time. The results of the third contribution proved that the RF classifier using the uni-gram feature extraction method with 3000 matrices had recorded the
best performance. Finally, ML-based on Apache Spark was used to predict chronic diseases.