Author: Elbattah, Mahmoud Ibrahim./ Title: Computational Intelligence Techniques for Big Data Analytics /

Search In this Thesis

العنوان

Computational Intelligence Techniques for Big Data Analytics /

المؤلف

Elbattah, Mahmoud Ibrahim.

هيئة الاعداد

باحث / محمود إبراهيم البتة

مشرف / عبد البديع محمد سالم

مشرف / محمد إسماعيل رشدي

مشرف / مصطفي عارف

تاريخ النشر

2017.

عدد الصفحات

150 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

1/1/2017

مكان الإجازة

جامعة عين شمس - كلية الحاسبات والمعلومات - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

149

from

149

Abstract

The world of Big Data continues to expand, and forge the landscape of decision making and analytics. Datasets are rapidly growing in size and complexity, and there is a pressing need to develop solutions to harness this deluge of data for producing useful insights. This study addresses the tasks in relation to storing, querying, analysing and visualising Big Data from a graph-based perspective. Through the study, datasets extracted from the immense knowledgebase of Freebase are utilised. Initially, a web-based tool for data visualisation was developed, named as FreebaseViz, for visually exploring the schema of Freebase data. The visualisation design is built upon node-link network layouts, which can facilitate exploring connectivity, visual search and analysis, and visualising patterns underlying the schema graph. FreebaseViz is claimed to enable users to interact with the schema visualisations to filter and drill into lower levels of detail, and highlight subsets of the schema graph. In addition, a graph database-oriented approach is embraced in a further bid to boost the visualisation query-ability using graph-based query operations.
Subsequently, the study conducted a graph-driven methodology for the analysis and visualisation of Freebase complex schema. Specifically, our methodology utilised Freebase schema objects in order to construct a directed weighted graph. The schema graph is employed to perform a modularity-based analysis in order to detect communities underlying Freebase data. In light of that, the detected communities were effectively used for the purpose of revealing unobserved or implicit domain relationships.
In terms of storing and querying large-scale datasets, a graph database-oriented approach is proposed, which considered Freebase data as a large graph. The proposed approach endeavoured to address the limitations encountered within traditional relational models. Furthermore, scalability and query efficiency of the approach are verified based on empirical experiments using a subset of Freebase data that comprised a large-scale graph consisting of more than 500K nodes, and 2M edges
Furthermore, the study addresses the problem of entity clustering within large-scale knowledge graphs with application to the knowledgebase of Freebase. Particularly, the clustering task is approached form a mere graph-driven perspective. Entities were aimed to be clustered based on structural similarity within a knowledge graph. In this manner, entities were clustered in an unsupervised fashion by matching their link-based structure rather than relational attributes.
Eventually, the study aimed to develop an approach for estimating the consistency of knowledgebase triples. The proposed approach was based on utilising machine learning in order to learn the graph-based patterns of the triples. Specifically, the study investigated the feasibility of training a model to learn triples patterns in terms of subject-predicate-object. The validity of the method was experimented using a relatively large-scale subset of Freebase data. The dataset incorporated about 10M triples, which contained 6M true patterns and 4M false patterns randomly generated. The study availed of the cloud platform of Microsoft Azure in order to conduct the large-scale machine learning experiments efficiently. On top of the Azure platform, an Apache Spark cluster was deployed to realise a distributed computing environment. The classifier model evidently demonstrated a relatively high accuracy. Broadly, the study endeavoured to present and emphasise the appropriateness of graph-based methods for dealing with Big Data scenarios in terms of storage, querying, visualisation, and predictive analytics.