Development of WordBased Text Compression Algorithm for Indonesian Language Document
Development of WordBased Text Compression Algorithm for Indonesian Language Document
Author : ARDILES SINAGA; HERTOG NUGROHO; ADIWIJAYA Published on : ICoICT 2015
Abstract
“Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in
several places. Data transmission from headquarters to branch offices make the company must provides good tools to do it. These companies also need tools that can be used to compress data to reduce their size. The main idea of the word-based encoding is to extract each word of the source text, then it is checked whether containing capital letters or not. After that, it is checked if there is a symbol or number. The particle will be separated from the basic word using stemming algorithm. Symbols, numbers and affixes will be indexed in the basic dictionary. The basic word will also be checked whether it exists in the basic dictionary or not. If there is not a match, then the word will be stored in the supplement dictionary. The experiment was conducted on the text file with the size from about 10K bytes up to 500K bytes with 16-bits length codewords. The result shows that the compression ratio of the proposed method is comparable with the previous ones, while its processing time is much better than the Reversed Sequence of Characters on LZW method.”