Search In this Thesis
   Search In this Thesis  
العنوان
Enhancing XML Compression Techniques /
المؤلف
Ahmed, Aya EL-Sayed Ahmed EL-Sayed Hamed.
هيئة الاعداد
باحث / آية السيد احمد السيد حامد أحمد
مشرف / إكرام فتحي عبدالجواد
مشرف / سمير الدسوقي السيد الموجي
مناقش / آية السيد احمد السيد حامد أحمد
الموضوع
Compression Techniques.
تاريخ النشر
2013.
عدد الصفحات
85 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
1/12/2013
مكان الإجازة
اتحاد مكتبات الجامعات المصرية - Computer Sciences
الفهرس
Only 14 pages are availabe for public view

from 101

from 101

Abstract

Nowadays, XML (eXtensible Markup Language) is considered as the standard language of exchanging data over the internet and networks. It is widely used for exchanging, manipulating and archiving data in Enterprises Applications Integrations (EAI). Moreover, it is also used for migrations from legacy systems without changing the entire applications.
During the implementation of E-Freight and E-Tariff projects in the Egyptian Customs Authority using XM-Based messaging systems, it was found that a bulky number of small and medium XML-messages exchanged daily that may cause network bottlenecks, and require a huge disk space to be archived for long terms. The XML structure is verbose, so the size of the XML document can be ten or more times than representing the same data in other formats. Although XML document is mainly a text file that can be compressed by traditional compression tools; but in the recent years several specific XML compression tools had been developed to employ the exposed structure data of the XML documents during the compression process to improve the compression ratio. XMill is the premier XML-conscious compression tool. However XMill compresses about twice as good as GZip, it is less effective in the small XML Messages used in the messaging systems.
In this thesis, a new algorithm “XHuffman” is proposed to compress the XML structure of the messages obtained from different messaging systems and brokers. It aims to minimize the bandwidth needed for exchanging a massive number of small and medium XML messages over networks and to save the storage needed for archiving them for long terms. The XHuffman algorithm depends on coding and decoding the XML elements instead of characters. The XSLT (XML Standard Transformation Language) Language is used in the implementation phase to generate the Huffman code. Experiments showed that the compression ratio of the XHuffman tool is outperforming the XMill and GZip compression methods over this type of messages.
In addition, the benefits of using a pre-process method called “Bundle” to reduce the number of messages in day-to-day business process, and to decrease network bottlenecks are highlighted. In the Bundle process, messages with the same structure are grouped together into package/packages prior to the compression process. As result, the compression ratio of the bundled message is comparable with sending each message compressed separately in the channel. On the other side, the post-process “Split” can be used after decompressing the package to resourcefully process the messages.