Author: Abd Elhameed,Saad Abd Elhameed Saad./ Title: A Privacy Preservation Publishing<br>Approach in Big Data Streams /

Search In this Thesis

العنوان

A Privacy Preservation Publishing
Approach in Big Data Streams /

المؤلف

Abd Elhameed,Saad Abd Elhameed Saad.

هيئة الاعداد

باحث / Saad Abd Elhameed Saad Abd Elhameed

مشرف / Mohamed Essam Khalifa

مشرف / Sherin Mohamed Mahmoud Moussa

تاريخ النشر

2018

عدد الصفحات

133p.;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

علوم الحاسب الآلي

تاريخ الإجازة

1/1/2017

مكان الإجازة

جامعة عين شمس - كلية الحاسبات والمعلومات - نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

133

from

133

Abstract

With the recent remarkable and fast evolution in telecommunication and computing technologies, great amounts of individuals’ data are collected and used by several organizations in the society. This includes diverse data sources, often for data of high dimensionality. At the same time, most of these data are stored in tabular format and can include sensitive content.
In some cases, some organizations need to share these gathered data to be used in business analysis, decision making or scientific research purposes, which can involve sensitive information about individuals. However, these data cannot be published in their original form to other third parties, due to the associated privacy concerns. Consequently, preserving data privacy represents an essential task in order to allow such data to be published with the guarantee of preserving individuals’ privacy when sharing their included private data. This protects the identity of individuals from being discovered, and their sensitive information from being disclosed by any intruder through the published data. The data required to be published can be static or dynamic from data streams, including Single Sensitive Attribute (SSA) or Multiple Sensitive Attributes (MSA). In this context, Privacy-Preserving Tabular Data Publishing (PPTDP) has drawn considerable attention from the research community, where different anonymization approaches have been proposed to preserve the privacy of individuals’ tabular data.
This thesis introduces a comparative study to analyze and evaluate the main different data anonymization approaches that have been introduced in PPTDP. The study investigates the three broad areas of research: SSA, MSA and data streams. A detailed criticism is presented to highlight the strengths and the weaknesses of each approach, supported by detailed comparison tables. In addition, the presented study investigates the deployment of the data anonymization approaches in the cloud and Internet of Things (IoT) environments. Besides, a research gap analysis is discussed, with a focus on capturing the current state of art in this field in order to highlight the future directions that can be considered. In addition, we consider the area of privacy-
iii
preserving of static data publishing by proposing an Enhanced Additive Noise (EAN) approach for privacy-preserving microdata with SSA publishing. The EAN approach enforces a newly-proposed privacy constraint on the value of Sensitive Attribute (SA) in the published data, whereas the original values of the other attributes are published to preserve better data utilization and attributes’ distribution. Hence, the proposed approach maintains better published data utility to allow more accurate mining and analytical results from the published data, where more robust privacy protection against privacy attacks is provided.
On the other hand, data streams have become a widely-adopted data representation format in many real-world domains and applications. Similarly, this data streaming may be needed to be published for different scientific research, mining, or analysis purposes. However, such streams may also contain personal-specific data that could be considered as sensitive data about individuals. When sharing these streams, these sensitive data should be well-protected against many privacy disclosure attacks to preserve individuals’ privacy. This makes the privacy preserving of data streams, while maintaining their utilization, is a real challenge. Consequently, in this thesis, the area of privacy-preserving of data stream publishing is investigated, where some research studies have started to consider different ways of privacy-preserving to publish such data streams. However, the investigated approaches consider data streams with only SSA. In addition, they do not protect the published streams from all possible privacy attacks. Thus, this thesis proposes the Restricted Sensitive Attributes-based Sequential Anonymization (RSA-SA) approach for privacy-preserving data stream publishing, in which stream tuples are anonymized sequentially. Besides, two new privacy restrictions are introduced to restrict the published Sensitive Attributes (SAs) values: Semantic-diversity and Sensitivity-diversity. Thereby, RSA-SA approach can protect the sensitive values of the published data streams against the related privacy attacks, which are the attribute disclosure, skewness, similarity, and sensitivity attacks. In addition, RSA-SA approach handles data streams that have either single or multiple sensitive attributes with a minimum information loss and delay time. Therefore, the data utility of the published data streams is efficiently maintained to provide more accurate mining and analytical results out of such streams, where robust invulnerability to privacy attacks is sustained.