Development of WordBased Text Compression Algorithm for Indonesian Language Document


Development of WordBased Text Compression Algorithm for Indonesian Language Document

 

Author		: ARDILES SINAGA; HERTOG NUGROHO; ADIWIJAYA
Published on	: ICoICT 2015

 

Abstract

“Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in
several places. Data transmission from headquarters to branch offices make the company must provides good tools to do it. These companies also need tools that can be used to compress data to reduce their size. The main idea of the word-based encoding is to extract each word of the source text, then it is checked whether containing capital letters or not. After that, it is checked if there is a symbol or number. The particle will be separated from the basic word using stemming algorithm. Symbols, numbers and affixes will be indexed in the basic dictionary. The basic word will also be checked whether it exists in the basic dictionary or not. If there is not a match, then the word will be stored in the supplement dictionary. The experiment was conducted on the text file with the size from about 10K bytes up to 500K bytes with 16-bits length codewords. The result shows that the compression ratio of the proposed method is comparable with the previous ones, while its processing time is much better than the Reversed Sequence of Characters on LZW method.”

Leave a Reply

Your email address will not be published. Required fields are marked *