Author: EL-Ami, Mohye EL Deen Esmail Mosa./ Title: Machine learning approache for knowledge acquisition from database /

Search In this Thesis

العنوان

Machine learning approache for knowledge acquisition from database /

المؤلف

EL-Ami, Mohye EL Deen Esmail Mosa.

هيئة الاعداد

باحث / محى الدين اسماعيل موسى العالمى

مشرف / رافت الكمار

مناقش / عطا الالفى

مناقش / محمد ابراهيم الشعراوى

الموضوع

Introduction of knowledge discovery. Dimensionality reduction technique.

تاريخ النشر

2002.

عدد الصفحات

149 p . :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2002

مكان الإجازة

جامعة بنها - كلية الهندسة بشبرا - Department of electric

الفهرس

Only 14 pages are availabe for public view

from

163

from

163

Abstract

The current information age is characterized by an extraordinary expansion of data that are being generated and stored about all kinds of human endeavors. An increasing proportion of these data is recorded in the form of databases, in order that the computer may easily access it. The evailability of very large volume of such data created a problem of how to extract useful, task-oriented knowledge from them. Data analysis techniques that have been traditionally used for such tasks include numerical taxonomy, multidimensional analysis, multivariate statistical methods, stochastic models, time series analysis and nonlinear estimation techniques. Theses techniques facilitate useful data interpretations, and can help to generate important insights into the processes behind the data. However, the interpretations and insights are ultimate knowledge. Yet, such knowledge is not created by these tools, but installed has to be derived by human data analysis. Reseachers have turned to ideas and methods developed in the field of machine learning to overcome the above limitations. This field is natural source of ideas for this purpose. It is concerned with the development of computational models for acquiring knowledge from facts and background knowledge. These models have led to the emergence of a new research area called mining and knowledge discovery. The objective of the research presented in this thesis is to develop new machine learning algorithms for knowledge acquisition from a database. Three algorithms are developed to satisfy this aim. These algorithms are dimensionality reduction of a database, rules extraction from a database using trained neural network (ANN) and genetic algorithm (GA) and inductive learning algorithm for discovering comprehensible knowledge from a database.
The first algorithm is : the dimensionality reduction of a database. A central problem in machine learning is to extract the representative set of attributes that construct a classification model for a particular task. Attributes in order to reduce the cost of attribute measurement, increase classifier efficiency and allow higher classification accuracy. The conventional databases may be included a memo (text) attributes. The new proposed algorithm presented in this thesis aims to extract the most relevant attributes set from a given database. It performs two simultaneous stages. The first stage is rough pruning . This stage includes the following steps:
-Building a domain field dictionary (DFD) with help of the domain expert.
-Using the DFD to
Delect the memo attributes inside the database:
Abstract the memo attributes to minimum level ,value(s).
-Measuring the probability of the value(s) inside each attribute (information measure).
-Dropnig the less information attributes a given database.
The second stage is fine purning. This stage concerned with determing the set of revelant attributes according to;
- The calculation of an evaluation function for each possible attributes set.
- -This function depends on the calculation of the correlation and conditional probability between attributes-to attribute-to-target.
- The extraction of the set which corresponds to the maximum value of the evaluation function. This set includes the most relevant attributes in a given database.
In additional, the algorithm includes a proposed technique for reducing the search space linearly.
The second algorithm is : rules extraction from a database using trained artificial neural network (ANN) and genetic algorithm (GA). This algorithm can be summarized as follows;
- Encoding the values of each attribute in a given database by fixed binary string.
- Dividing the encoded database into the input attributes vectors and the corresponding output classes vectors.
- Training the ANN on the input and output vectors and keeping the resulted groups of weights,
- The first weight group, (WGI)ij, includes the weights between the input node I and the hidden node j.
- The second weight group, (WG2)j.k, includes the weights between the hidden node j and the output node k.
- Constructing the objective function, wk, wich presents the final output of each output node in terms of input attributes values and the two groups of weights.
- Maximizing this objective function to obtain the optimal input attributes values under the constraint(s) that : all input the attributes values are binary. This yields to a non-linear integer optimization problem.
- Using the GAto find these optimal values (chromosome) wich maximize the output functionwk and satisfy the imposed constraints.
- The proposed algorithm uses only a part of the total weigts. Therefore, it reduces the search space and the computational time required to extract the rules.
- The third algorithm is : an inductive learning algorithm for discovering comprehensible knowledge from a database. The algorithm treats the problem of continuous attribute value in a given database using a fuzzification process.
- This process is performed by transferring the continuous values into linguistic terms using a suitable membership function. These terms reduce the number of values in the contiuous attributes to minimize the search space. The algorithm controls the induction rules through three levels. These ;evels are the confidence level (Pc), the supporting level (DBc) and the search level (level1). The confidence level is the conditional probability of conjuction value (s) given a specific class. The conjuction values (s) are extracted when the Pc=100%. The supporting level is the percentage of records, which are covered by a rule in a given database. A refinement of the extracted rules is performed according to a specific threshold supporting level (DBth). The search level is the number of antecedents in each exracted rule. The depth of this level prevents the algorithm to extract the rules which have the number of antecedents exceed than the number of attributes minus one.