Handling Imbalanced Data in Customer Churn Prediction using Combined Sampling and Weighted Random Forest
Handling Imbalanced Data in Customer Churn Prediction using Combined Sampling and Weighted Random Forest
Author : VERONIKHA EFFENDY; ADIWIJAYA; Z K ABDURAHMAN BAIZAL Published on : ICoICT 2014 (Universitas Telkom - Bandung, Indonesia)
Abstract
Customer churn is a major problem that is found in the telecommunications industry because it affects the company’s revenue. At the time of the customer churn is taking place, the percentage of data that describes the customer churn is usually low. Unfortunately, the churn data is the data which have to be predicted earlier. The lack of data on customer churn led to the problem of imbalanced data. The imbalanced data caused difficulties in developing a good prediction model. This research applied a combination of sampling techniques and Weighted Random Forest (WRF) to improve the customer churn prediction model on a sample dataset from a telecommunication industry in Indonesia. WRF claimed can produce a prediction model which has a good performance on the imbalanced data problem. However, this research found that the performance of the prediction model developed by WRF using the dataset is still quite low. Sampling techniques were applied to overcome this problem. This research used the combination of simple under sampling and SMOTE. The result shown that the combined sampling and WRF could produce a prediction model which had better performance than before.