Comparative Study of Smoothing Techniques on Indonesian and English Language Models


Comparative Study of Smoothing Techniques on Indonesian and English Language Models

 

Author		: ISMAIL
Published on	: Journal of Theoretical and Applied Information Technology (JATIT) (Little Lion Scientific Islamabad Pakistan)

 

Abstract

Indonesian language is one of Austronesia languages. It differs from English language, which is one of isolating languages. For Indonesian language, there has been no study of smoothing effect in its language model. Although from mathematical point of view, language model has no direct dependency to specific language, Whittaker [1] showed that, for Russian and English, there are differences in smoothing effect for those languages. In this paper, we studied various smoothing techniques in language model for Indonesian language and compared it to that of English language. Our experiments showed that smoothing effects for statistical Indonesian language model have better perplexity reduction than that of English language. We showed our results in terms of cross-entropy differences among various techniques relative to Katz smoothing.

Leave a Reply

Your email address will not be published. Required fields are marked *