Search In this Thesis
   Search In this Thesis  
العنوان
An Enhanced Clustering Algorithm For Gene Expression Data
الناشر
:Ain Shams University
المؤلف
Amin,Huda Amin Maghawry
هيئة الاعداد
مشرف / Amin,Huda Amin Maghawry
مشرف / Khalifa,Mohamed Essam
مشرف / Taysir Hassan Abdel Hamid Soliman
مشرف / Huda Amin Maghawry Amin
الموضوع
Data
تاريخ النشر
, 2008
عدد الصفحات
128p
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Networks and Communications
تاريخ الإجازة
1/1/2008
مكان الإجازة
اتحاد مكتبات الجامعات المصرية - Information Sciences
الفهرس
Only 14 pages are availabe for public view

from 151

from 151

Abstract

DNA microarray technology has emerged as an effective technique that allows monitoring and measuring of a huge amount of gene expression levels simultaneously under different conditions, at different developmental stages, or in various tissues. Gene expression levels provide important data, required for biological interpretation of genes and their functions. Such information has a great impact on many applications that serve researches in many fields like medicine and agriculture.
Clustering as a data mining technique is widely used in analyzing gene expression data. Clustering is used to discover the function relations between genes and their involvements in biological processes, based on that genes with similar expression patterns would correspond to common biological function. Such analysis allows biologists to study and develop a complete understanding of the functions of an entire set of genes for a species.
Some clustering algorithms restrict to one object belongs to exactly one cluster. While for the analysis of gene expression data, the single gene can belong to multiple clusters as genes can participate in different genomic functions and are frequently involved in variety of biological processes. Therefore, in this thesis, a Hierarchical Overlapped Divisive Algorithm (HODA) as an enhancement to hierarchical divisive algorithm was proposed to identify the overlapping among clusters at different levels of similarity. Genes within a certain level belongs to the clusters in that level with a certain membership degree. As a result, researchers can analyze the membership matrix for each level between genes and clusters to study overlapping among clusters by using an appropriate overlapping threshold. The performance of the proposed algorithm was analyzed and compared with divisive algorithm, using gene expression data set with varying data sizes and dimensions. As a newly proposed algorithm, its clustering results were evaluated against the clustering results of classical clustering algorithms. The clustering algorithms were compared over a range of number of clusters, using two different gene expression data sets one is related to yeast cell cycle and the second is related to rat central nervous system development, with different validation measures: internal validation measures which represent the goodness of fit between the input data and the resultant clusters without the need to external information and external validation measures which validates a clustering result by comparing it to a given standard. The results of validating the generated clusters were variable, as no single clustering algorithm has the best clustering results with all validation measures. The proposed algorithm with some measures and different numbers of clusters gave better clustering results compared with other classical algorithms; such cases are illustrated in details through the thesis; therefore the proposed algorithm can be used as an alternative clustering algorithm in some cases due to the overlapping information it provides among clusters at different levels of similarity.