Customer Retention In Telecom: A Data-Balanced Machine Learning Perspective

Sneha Arvind Deshmukh

Department of Information Technology, Manipal University Jaipur, Rajasthan, India


Abstract

Churn prediction is the process of identifying customers who stop using services. Churn is not only the problem in Telecom industry but also banking, insurance, gaming companies, and internet service providers are also facing this challenge. This study focuses on churn prediction in telecom industry to determine the best classification model and reduce the number of attributes in the dataset. Machine learning models like Random Forest, K-Nearest Neighbor, Decision Tree, Support Vector Machine, Logistic Regression, Bagging Classifier, Extreme Gradient Boosting, Stochastic Gradient Descent Classifier, and Gaussian Naive Bayes were used. To handle imbalance data and for hyper parameter tuning, techniques like SMOTE, ENN, Under-Sampling, Over-Sampling and K-cross fold validation were used. Random Forest classifier performed exceptionally well in forecasting customer churn in the telecom sector, as evidenced by the results. Its accuracy rate was 90.30% with all attributes, and 90.90% with reduced attributes dataset. This implies that the dataset with reduced attributes may be useful for churn prediction tasks in a variety of industries, offering useful information to companies trying to reduce customer attrition. This work validates itself by comparing with four previously published research.