Search In this Thesis
   Search In this Thesis  
العنوان
A Novel Approach to Mining Biosequence Data,
الناشر
AIN SHAMS UNIVERSITY. Faculty of Computer & Information Sciences. Department of Information Systems,
المؤلف
Soliman, Taysir Hassan Abdel Hamid
تاريخ النشر
2003 .
عدد الصفحات
280 p.
الفهرس
يوجد فقط 14 صفحة متاحة للعرض العام

from 311

from 311

المستخلص

Real life transactional databases usually contain both item
information and hierarchical information in a form of taxonomy. Mining can take different approaches through the hierarchy: generalized/ specialized approaches. The classical frequent pattern mining algorithms, such as Apriori and FP-growth, can produce a huge number of redundant rules if they are applied at the primitive level. A concept-level hierarchy mining algorithm is needed to mine multi-level concept hierarchy. In this thesis, two main issues are tackled: proposing a hybrid approach to mine concept level hierarchy and applying this approach in the biological domain. Adaptive-H-Struct algorithm
is proposed to attack the problem of mining concept level
hierarchy. Adaptive-H-Struct is a pattern growth method,which avoids Apriori candidate generate-and-test method for the candidate generation approach. It produces frequent patterns to be used for rule generations. The efficiency of this algorithm is
guaranteed by the high flexibility of FP-growth. To prove high scalability, an extensive performance study has been implemented using synthetic data, which showed that Adaptive-H-Struct is efficient and scalable. It outperforms the previous proposed algorithms for mining generalized association rules: Cumulate, Prutax, and Ready-and-GO algorithms. Adaptive-H-Struct has
proved to be a 38 times faster (on average) than Cumulate
algorithm on all experiments of the performance study. Also, Adaptive-H-Struct is 237 times faster than Prutax and 48 times faster than Ready-and-GO. However, data itself should have correlation within it.