Search In this Thesis
   Search In this Thesis  
العنوان
HYBRID GENETIC ALGORITHM- DECISION TREE METHOD FOR KNOWLEDGE DISCOVERY \
الناشر
Abeer Mahmoud Mahmoud ,
المؤلف
Mahmoud, Abeer Mahmoud
هيئة الاعداد
مشرف / عبد البديع محمد سالم
مشرف / خالد احمد نجاتى
تاريخ النشر
2004.
عدد الصفحات
xiii,146p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
1/1/2004
مكان الإجازة
اتحاد مكتبات الجامعات المصرية - computer
الفهرس
Only 14 pages are availabe for public view

from 32

from 32

Abstract

Knowledge discovery is a multidisciplinary field. It includes database, visualization, statistics, machine learning and expert systems. Knowledge discovery process consists of six stages: data selection, cleaning, enrichment, coding, data mining and reporting. Data mining stage is the process of discovering useful patterns in large data sets. There are various mining techniques used for different purpose such as query tools, statistical techniques, online analytical processing (OLAP), case-based learning, decision trees, association rules, neural networks and genetic algorithms. Data mining is supported by hosting models or tasks such as: clustering, regression, summarization and classification models.
Classification is an important data-mining task that has a wide range of applications; one of them is medical diagnosis. The goal of classification is to build a model that is used to assign class labels to a database of testing records, where the values of the predictor attributes are known but the value of the class label is unknown. A variety of classification algorithms have been used in the literature. These algorithms can be divided into four main categories, which are decision tree based classification algorithms, neural network based classification algorithms, statistical based algorithms and Bayesian learning based algorithms.
The main objective of this study was to explore a new method integrating genetic algorithms and decision tree approaches, for data mining classification task.
Decision trees’ learning is one of the most widely used and practical methods for data mining classification task.It is a method for approximating discrete-valued functions that is robust to noisy data and capable of learning disjunctive expressions. Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. The search for an appropriate hypothesis begins with a population of initial hypotheses. Members of the current population give rise to the next generation population by means of operations such as evaluation by fitness, selection, mutation and crossover.
This research demonstrates the usefulness of applying genetic algorithms approach in improving classification rates over the well known decision tree algorithm C4.5 (Quinlan, 1993).The study presents a new approach for developing two classifiers based on algorithm C4.5. The first classifier (RFC4.5) uses the RainForest framework database access method and replacing C4.5 pruning algorithm with a simple pruning algorithm. The second classifier (GARFC4.5) uses genetic algorithms approach. The two developed classifiers have been applied to large medical database for thrombosis diseases of 20MB size. The results show that RFC4.5 classifier with the simple pruning algorithm improves the classification rate from 81% to 93% over traditional C4.5. Also, adding genetic algorithms approach, GARFC4.5 classifier enhances the classification accuracy from 81% and 93% to 94% over traditional C4.5 and RFC4.5 classifiers respectively.
Moreover the study includes the application of our developed GARFC4.5 classifier on another database for breast cancer disease characterized by numerical attributes. Also a comparison have been done between our developed classifier and thirty-three classification algorithms, based on different learning methodologies, published recently by (Lim et al, 2000). The results show that GARFC4.5 classifier gives a reasonable classification rates comparing to those algorithms.