Search In this Thesis
   Search In this Thesis  
العنوان
Gene Signatures Prediction of Genetic Diseases \
المؤلف
Barakat, Hassan Sayed Ramadan.
هيئة الاعداد
باحث / حسن سيد رمضان بركات
مشرف / خالد البهنسي
مشرف / محمد العليمي
مشرف / هدي آمين مغاوري
تاريخ النشر
2022.
عدد الصفحات
101 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Information Systems
تاريخ الإجازة
1/1/2022
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 101

from 101

Abstract

Over the last few years, several standard clustering approaches have been proposed to evaluate gene expression data. On the other hand, identifying breast cancer subtypes with consistency is difficult.
DBSCAN-BICLIC, a modified BICLIC biclustering algorithm, was proposed to discover signature genes of breast cancer subtypes. DBSCAN-BICLIC gives an efficient solution for properly clustering seeds. Experimental results on 2509 breast cancer situations were evaluated using clinical data. The analysis resulted in the division of breast cancer conditions into 22 groups, each with its own set of clinical data.
Clinical criteria that are similar will be gathered together. DBSCAN-BICLIC discovered the biomarkers for each group of conditions, which is an intriguing part of the experiment.
As a result, each subtype of breast cancer has its own set of signature genes. For 10 groups, the DBSCAN-BICLIC algorithm was used. The top five effective signature genes that have been created for each group are presented. DBSCAN-BICLIC has identified 40 signature genes across all 10 groups. According to the literature, 32 of them have been validated as signature genes. The 32 genes are extremely effective breast cancer prognostic genes.
Although the promising results of DBSCAN-BICLIC, but it cannot work automatically with any biological dataset. This is because of the epsilon parameter of DBSCAN clustering algorithm, and most of researchers choose it randomly. Therefore, the objective was to propose a heuristic approach to find the optimal epsilon for DBSCAN clustering algorithm. The concept of this approach is to repeat DBSCAN many times, and each time it calculates a different epsilon value till it finds the optimal epsilon. Finding the optimal epsilon depends on evaluating clusters each time, and for sure optimal epsilon has the best evaluation scores. Proposed approach uses the root mean square standard deviation (RMSSTD), and the R-squared (RS) to evaluate clusters. The proposed approach had been run on three benchmark different dimensional datasets. Also, Silhouette index was used to validate the clustering results of the proposed approach. The proposed approach was successfully able to find the optimal epsilon for all three datasets.