Back to Projects
🤖

Telco Customer Churn Prediction

An End-to-End Machine Learning Solution with a Flask Web App for Real-Time Churn Prediction & Analytics

📅 Duration
November 2024 – December 2024
👤 Project Type
Individual ML Project
🎯 Model Performance
79.7% Recall | 84.1% AUC-ROC
🚀 Deployment
Production-Ready Flask Web App

🎯 Project Overview

In the competitive telecommunications industry, customer churn is a critical challenge. The cost of acquiring a new customer is 5-25 times higher than retaining an existing one, making churn prediction a high-value business problem. To address this, I developed an end-to-end machine learning solution that not only predicts which customers are likely to churn but also provides actionable insights to drive proactive retention strategies.

The final deliverable is a production-ready Flask web application featuring a modern UI, real-time prediction capabilities, and an interactive analytics dashboard.

⚙️ Solution Architecture & Methodology

The project followed a structured, end-to-end machine learning workflow, from data exploration to a deployed web application.

� Data Analysis & Feature Engineering:

  • Conducted in-depth Exploratory Data Analysis (EDA) using a Jupyter Notebook to identify key churn drivers, such as contract type, tenure, and internet service.
  • Developed a robust preprocessing pipeline with StandardScaler for numerical features and OneHotEncoder for categorical data.
  • Addressed significant class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) to ensure the model was effectively trained on the minority (churn) class.

🤖 Model Development & Selection:

  • To find the optimal solution, I evaluated five different algorithms: Logistic Regression, Random Forest, XGBoost, LightGBM, and a PyTorch-based Neural Network.
  • Employed GridSearchCV with 5-fold cross-validation to systematically tune hyperparameters for each model, optimizing for the AUC-ROC score.
  • Champion Model Selection: The Tuned Logistic Regression model was selected as the champion. While more complex models like XGBoost achieved slightly higher accuracy, the Logistic Regression model delivered the highest recall (79.7%) on the churn class. This was a strategic decision, as failing to identify a potential churner (a false negative) is far more costly to the business than mistakenly targeting a loyal customer.

🌐 Flask Web Application & Dashboard:

  • The trained scikit-learn pipeline was serialized using Joblib for deployment.
  • A user-friendly Flask web application was built to provide two core functionalities:
    • Real-Time Prediction: An intuitive form for predicting churn for a single customer.
    • Bulk Processing: A CSV/Excel upload system for batch predictions with downloadable results.
  • An interactive dashboard was created using Plotly.js to visualize model insights, such as feature importance and customer risk segmentation, directly in the web app.

📊 Final Model Performance

The Tuned Logistic Regression model provides the best balance of predictive power, interpretability, and business alignment.

Metric Score Justification
Recall (Churn Detection) 79.7% Primary Goal: Correctly identifies nearly 80% of customers who will actually churn.
AUC-ROC Score 84.1% Indicates strong overall model discrimination between churners and non-churners.
Overall Accuracy 74.2% High overall correctness, balanced by the focus on the minority class.
Precision (Churn) 51.0% Acceptable trade-off; prioritizing recall means some non-churners are flagged for retention efforts, which is a lower-cost error.

� Actionable Insights & Strategic Recommendations

The interpretability of the Logistic Regression model allowed for the extraction of clear, data-driven business strategies.

Key Finding Model Coefficient Strategic Recommendation
Two-Year Contracts -1.443 Strongest Retention Factor: This reduces the odds of churn by 76%.* Offer significant incentives for customers to upgrade from month-to-month to long-term contracts.
Fiber Optic Service +0.622 Highest Risk Factor: Investigate potential pricing, performance, or reliability issues specific to the fiber optic service to reduce its associated churn rate.
Low Customer Tenure -1.073 Loyalty Driver: New customers are at the highest risk. Implement a robust 90-day onboarding program with proactive check-ins to build early loyalty.
Electronic Check Payments +0.384 Payment Method Risk: This payment method is strongly correlated with churn. Incentivize customers to switch to automatic, recurring payment methods with small monthly credits.

*Odds ratio calculated as 1 - exp(coefficient).

💻 Technology Stack

🔬 ML & Data Science

Python, Scikit-learn, XGBoost, LightGBM, Pandas, NumPy, SMOTE, Joblib

🌐 Web Development

Flask, HTML5, CSS3, JavaScript

📊 Data Visualization

Plotly.js, Matplotlib, Seaborn

}

📊 Model Performance

The final model is a Tuned Logistic Regression, selected after evaluating five different algorithms. It was chosen for its optimal balance of performance, interpretability, and efficiency.

79.7%
Recall (Churn Detection)
84.1%
AUC-ROC Score
74.2%
Overall Accuracy
51.0%
Precision (Churn)

💡 Key Business Insights

Based on the logistic regression coefficients, the model identified critical churn predictors:

🛡️ Strongest Retention Factor
Two-Year Contracts
Coefficient: -1.443 - Most powerful retention factor, reducing churn risk by 76%
⚠️ Highest Risk Factor
Fiber Optic Service
Coefficient: +0.622 - Higher churn rates, possibly due to pricing or service issues
📈 Customer Loyalty Driver
Customer Tenure
Coefficient: -0.456 - Longer relationships significantly reduce churn probability
💳 Payment Method Impact
Electronic Check Risk
Coefficient: +0.445 - Electronic check payments correlate with higher churn

🎯 Strategic Recommendations

  1. Contract Strategy: Offer attractive incentives for annual/two-year contract upgrades to maximize retention
  2. Payment Optimization: Encourage automatic payment methods with discounts to reduce electronic check usage
  3. Service Quality Review: Investigate fiber optic pricing and service delivery to address higher churn rates
  4. New Customer Focus: Implement comprehensive 90-day onboarding programs to build early tenure
  5. Proactive Retention: Use model predictions to identify at-risk customers for targeted retention campaigns

⚙️ Technical Implementation

Machine Learning Pipeline:

  • Data Preprocessing: StandardScaler for numerical features, OneHotEncoder for categorical features
  • Class Imbalance Handling: SMOTE (Synthetic Minority Over-sampling Technique) for balanced training
  • Model Selection: Evaluated Random Forest, XGBoost, LightGBM, Neural Network, and Logistic Regression
  • Hyperparameter Tuning: Grid search and cross-validation for optimal performance
  • Model Interpretation: Feature importance analysis and coefficient interpretation

Web Application Features:

  • Real-time Predictions: Instant churn probability calculation with risk level classification
  • Bulk Processing: CSV/Excel upload with batch prediction capabilities (up to 16MB)
  • Interactive Dashboard: Dynamic charts with filtering options and business insights
  • Data Validation: Built-in validation with helpful error messages and sample templates
  • Export Functionality: Download prediction results with recommendations

🚀 Project Impact & Value

Business Value:

  • Enables proactive customer retention strategies with 79.7% accuracy in identifying churners
  • Provides actionable insights for reducing churn through contract optimization and service improvements
  • Delivers ROI through reduced customer acquisition costs and increased lifetime value
  • Supports data-driven decision making with interpretable model explanations

Technical Achievement:

  • Successfully deployed end-to-end ML solution from data analysis to production web application
  • Demonstrated expertise in multiple ML algorithms and model selection methodology
  • Created professional, user-friendly interface with modern UI/UX design principles
  • Implemented scalable solution with batch processing capabilities