Author: Amin,Huda Amin Maghawry/ Title: An Enhanced Clustering Algorithm For Gene Expression Data

Search In this Thesis

العنوان

An Enhanced Clustering Algorithm For Gene Expression Data

الناشر

:Ain Shams University

المؤلف

Amin,Huda Amin Maghawry

هيئة الاعداد

مشرف / Amin,Huda Amin Maghawry

مشرف / Khalifa,Mohamed Essam

مشرف / Taysir Hassan Abdel Hamid Soliman

مشرف / Huda Amin Maghawry Amin

الموضوع

Data

تاريخ النشر

, 2008

عدد الصفحات

128p

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Networks and Communications

تاريخ الإجازة

1/1/2008

مكان الإجازة

اتحاد مكتبات الجامعات المصرية - Information Sciences

الفهرس

Only 14 pages are availabe for public view

from

151

from

151

Abstract

DNA microarray technology has emerged as an effective technique that allows monitoring and measuring of a huge amount of gene expression levels simultaneously under different conditions, at different developmental stages, or in various tissues. Gene expression levels provide important data, required for biological interpretation of genes and their functions. Such information has a great impact on many applications that serve researches in many fields like medicine and agriculture.
Clustering as a data mining technique is widely used in analyzing gene expression data. Clustering is used to discover the function relations between genes and their involvements in biological processes, based on that genes with similar expression patterns would correspond to common biological function. Such analysis allows biologists to study and develop a complete understanding of the functions of an entire set of genes for a species.
Some clustering algorithms restrict to one object belongs to exactly one cluster. While for the analysis of gene expression data, the single gene can belong to multiple clusters as genes can participate in different genomic functions and are frequently involved in variety of biological processes. Therefore, in this thesis, a Hierarchical Overlapped Divisive Algorithm (HODA) as an enhancement to hierarchical divisive algorithm was proposed to identify the overlapping among clusters at different levels of similarity. Genes within a certain level belongs to the clusters in that level with a certain membership degree. As a result, researchers can analyze the membership matrix for each level between genes and clusters to study overlapping among clusters by using an appropriate overlapping threshold. The performance of the proposed algorithm was analyzed and compared with divisive algorithm, using gene expression data set with varying data sizes and dimensions. As a newly proposed algorithm, its clustering results were evaluated against the clustering results of classical clustering algorithms. The clustering algorithms were compared over a range of number of clusters, using two different gene expression data sets one is related to yeast cell cycle and the second is related to rat central nervous system development, with different validation measures: internal validation measures which represent the goodness of fit between the input data and the resultant clusters without the need to external information and external validation measures which validates a clustering result by comparing it to a given standard. The results of validating the generated clusters were variable, as no single clustering algorithm has the best clustering results with all validation measures. The proposed algorithm with some measures and different numbers of clusters gave better clustering results compared with other classical algorithms; such cases are illustrated in details through the thesis; therefore the proposed algorithm can be used as an alternative clustering algorithm in some cases due to the overlapping information it provides among clusters at different levels of similarity.