Author: Hassaan, Mosab Abdel-Hameed Mohamed./ Title: Using Indexing for efficient mining and query processing in graph databases /

Search In this Thesis

العنوان

Using Indexing for efficient mining and query processing in graph databases /

المؤلف

Hassaan, Mosab Abdel-Hameed Mohamed.

هيئة الاعداد

باحث / مصعب عبد الحميد محمد حسان

مشرف / ماهر شديد زايد

مناقش / كرم عبد الغني عبد الرحمن

مناقش / ماهر شديد زايد

الموضوع

Mathematics.

تاريخ النشر

2013.

عدد الصفحات

140 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

1/1/2013

مكان الإجازة

جامعة بنها - كلية الاداب - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

155

from

155

Abstract

Graph is a powerful modeling tool for representing and understanding objects and their relationships. In recent years, we have observed a rapid increase in the volume of graph data. The widespread of applications using graph data call for efficient techniques to manage and analyze the huge amount of available data, a very challenging problem due to the limitation of processing graph queries using existing database technology.
Subgraph Search, an important problem in graph data management, is studied in this thesis. The objective of subgraph search is to retrieve data graphs that contain a query graph. Processing the query by a sequential scan on graph database to check whether the query is a subgraph of each data graph is infeasible, since subgraph isomorphism testing is known as an NP-complete problem [8]. An important framework for tackling
this problem is the filtering and verification approach, where, in this approach, indexes are used to quickly filter out data graphs that are not possible in the result and produce candidate graphs. Then the candidate graphs are verified, that is, whether the query graph is a subgraph of each candidate, by a subgraph isomorphism algorithm. The efficiency of this approach depends on the filtering power of each indexing methodology and how fast it produces candidate graphs.
While this algorithmic framework worked well for small query graphs, the verification phase became a bottleneck when the query graph size increased or all data graphs contain the query. Thus, another way to reduce the query answering time is to develop an efficient subgraph isomorphism checking algorithm. Motivated by this, a two novel and efficient algorithms Fast-ON and Fast-P are proposed in this thesis for testing subgraph isomorphism. The two algorithms are based on Ullman algorithm [22], apply vertex-at-a-time-manner and path-at-a-time-manner respectively, and use effective optimizations to reduce the search space as much as possible. Comparing to the well-known algorithms, namely, Ullman algorithm [22], Vflib [7], and QuickSI [20], Fast-ON and Fast-P achieve up to 1-4 orders of magnitude speed-up for dense datasets and sparse datasets, respectively. We extend Fast-ON and Fast-P to test graph isomorphism problem. The two extended algorithms are called Fast-ON-ISO and Fast-P-ISO, respectively. Experimental evaluation shows that Fast-ON-ISO is suitable for graph isomorphism problem.
Another objective of our research is to develop a new feature-based indexing method. Feature-based indexing [9, 26, 5, 28, 33, 20] is a well-known graph indexing mechanism that works by extracting subgraphs of data graphs as indexing features, and using them to prune data graphs not having the features that are contained by the query. Several feature-based indexing methods have been introduced in recent years. These methods have to index too many features or select some of them in order to get an index with good pruning capabilities. None of these directions can give an effective solution to all graph indexing issues. In this thesis, we propose an efficient indexing approach which improves upon current feature-based methods, neither by the costly feature selection nor by explicitly indexing a multitude of features. We achieve this by compressing the pruning of multiple features into one feature with neighborhood information encoded. Neighborhood is further used to prune unmatched feature embeddings between the query and data graphs, thus cutting down the search space of subgraph matching, which significantly reduce the verification cost. We implement the approach by exhaustively enumerating small paths as features and we adopted Fast-P algorithm for the verification step to benefit from the embeddings pruning method. Our approach is called PathIndex. A comprehensive set of experiments verifies that query processing using PathIndex is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable