Search In this Thesis
   Search In this Thesis  
العنوان
An alternative data mining approach for big data /
المؤلف
Marzouk, Aya Shehata Mahmoud.
هيئة الاعداد
باحث / aya shehata mahmoud
مشرف / hegazy mohamed
مشرف / mervat mahdy
مشرف / ahmed mahoud
الموضوع
Data Mining.
تاريخ النشر
2020.
عدد الصفحات
152 p. :
اللغة
العربية
الدرجة
الدكتوراه
التخصص
الأعمال والإدارة والمحاسبة (المتنوعة)
تاريخ الإجازة
1/1/2020
مكان الإجازة
جامعة بنها - كلية التجارة - إحصاء
الفهرس
يوجد فقط 14 صفحة متاحة للعرض العام

from 175

from 175

المستخلص

Continuous increase in the volume of data captured by organizations; social media, government, industry, and science caused the explosion of data era and appearance of Big Data (BD) concept which became the most influential force in the daily life. Organizations have data for everything, right from what consumers like, to how they react but it does not mean how much data we have but what would you get out of that data. The challenge is not so much the availability, but the analysis of this huge data and how to deal with it.
The important reason that attracted a great deal of attention towards the field of Data mining (DM) is the availability of huge volume of data collections with difficulty of turning them into useful information and knowledge for managerial decision making.
Data Mining is the process of analyzing large data sets (Big Data) from different perspectives and uncovering correlations and patterns to summarize them into useful information. Nowadays, it is blended with many techniques such as Artificial Intelligence, Statistics, Data Science (DS), and Machine Learning (ML).
In recent years, following the explosion of Big Data, many studies started using the term “Data Science”, referring to an emerging area of work concerned with the collection, preparation, analysis, visualization, management, and preservation of large collections of information.
Data science includes a family of disciplines, one of the most important of which is statistics.
Big Data provide an opportunity for big analysis leading to big opportunities to advance the quality of life or to solve the mysteries of the world.
During the development of Big Data area, data scientists found various challenges and issues, statistical computing, and statistical learning, difficulties in data storage, data analysis and data visualization of it because of dealing with such huge datasets.
Moreover, a great challenge has been created because of amounts of redundancies in data analysis. The use of very large high dimensional data will result in more noise and redundant data. To efficiently manipulate data we will propose a new technique for the dimensionality reduction.
High-dimensional data are inherently difficult to analyze, and computationally intensive where irrelevant features, along with redundant features, severely corrupt the efficiency and the accuracy of data analysis and mining. Hence, High-dimensionality data reduction is extremely important in many real-world applications.
There must support and encourage fundamental research towards these technical issues for the aim of achieving the benefits of it.
Some studies argue that handling and using intelligently this huge data could become a new pillar of economics as well as scientific research.
Therefore, we will introduce a composite model, which deal with challenges of Big Data by improving quality to overcome the veracity characteristic problem and applying dimension reduction technique and k-means clustering mining technique to overcome the volume characteristic problem.
Problem Definition
Information can be retrieved from a hidden or a complex data set where data have hidden information and to extract this new information; interrelationship among the data has to be discovered.
Browsing through a large data set would be difficult and time consuming, we have to follow certain protocols, a proper algorithm will be needed to reduce, classify the data and find a suitable pattern among them.
Bigger data are not always better data. Due to its bigness, high dimensionality, noise and incomplete there are many challenges face the researcher to deal with it and analyze it.
The idea is that massive sets of data may be Human-sourced or Process-mediated or Machine generated. Decision- makers need access to smaller and more specific pieces of data from those large sets.
They use data mining to uncover the pieces of information that will inform leadership and help them to take suitable decisions. Efficient and effective techniques need to be evolved to analyze and discover valuable knowledge that is hidden within the data.
Big data forced the researchers to expand the existing DM techniques to cope with the evolved nature of data and to develop new analytic techniques.
The research question that arises now is how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to reduce big data dimensions to extract useful information from it, effective in removing duplicates, increasing learning accuracy and improving decision making processes.
The Aim of the Work
The main objectives of this dissertation are to:
1. Introduce a new definition of Big Data Mining.
2. Introduce techniques of Data Mining and Big Data Mining that can extract and discover information from such large data.
3. Introduce a new 6 stages of Big Data Quality technique that tries to solve the high dimensionality and the great volume of Big Data.
4. Introduce a Composite index of quality of data.
5. Introduce a Composite Data Mining model to deal with Big Data problems.
6. Introduce a reduced high quality data.
To satisfy those aims, we will organize the dissertation as follows;
Chapter Two: Introduce a brief review of the DM concept, definition, techniques, algorithms, models and applications.
Chapter Three: Introduce definitions, dimensions and recently added dimensions of Big Data. Moreover, a wide review of techniques and methods of Big Data that have been used for extracting useful information and patterns.
Chapter Four: Introduce Big Data Quality concept definition, dimensions and ways for measuring quality for such data.
Chapter Five: Presents the proposed approach and application of the proposed approach on a real data.
Chapter Six: Represents a summary, conclusion and further suggestions.