الفهرس | Only 14 pages are availabe for public view |
Abstract In the last few years, Outlier detection has become an important problem in many indus¬trial and financial applications. Outliers are usually caused by system faults, a sudden or an unexpected change in the existing behavior and human errors. This problem is further com¬plicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. Despite the enormous amount of data being collected in many scientific and commercial applications, outliers or anomalies are still quite rare. Detection of outliers has recently gained a lot of attention in many domains, ranging from video surveil¬lance and intrusion detection to fraudulent transactions, web usage logs and direct marketing. In this thesis, an outlier detection algorithm for data streams is presented. The basic idea of the proposed algorithm is to partition the incoming data streams into chunks and store these chunks one by one into a fixed width grid structure for further processing. Each chunk is processed with the combination of fixed width grid structure and the Incremental LOF algorithm. Through efficient pruning of safe regions, the proposed algorithm only needs to operate over the candidate regions for finding outliers. Experiment results demonstrate the accuracy, efficiency and scalability of the proposed algorithm. |