Customer Churn Prediction and Retention Analysis
About
This notebook analyzes customer churn using the IBM Telco Customer Churn dataset. The project follows a complete data science workflow including data loading, data cleaning, exploratory data analysis, visualization, feature engineering, model training, model evaluation, and business recommendation generation.
The dataset contains telecom customer information such as tenure, contract type, monthly charges, total charges, payment method, internet service, tech support, online security, and churn status. The goal of the project is to identify which customer groups are more likely to leave and build a simple machine learning model to predict churn risk.
Key insights showed that churn was higher among month-to-month contract customers, newer customers, electronic check users, fiber optic internet users, and customers with higher monthly charges. The final recommendation is to target these high-risk groups with onboarding support, loyalty offers, contract upgrade incentives, and proactive customer service.
The data source used for this project is the IBM Telco Customer Churn dataset. The dataset is publicly available and contains customer-level telecom data, including demographic information, account details, subscribed services, billing information, and whether the customer churned.
The dataset originally had 7,043 rows and 21 columns. Important variables included tenure, Contract, MonthlyCharges, TotalCharges, PaymentMethod, InternetService, TechSupport, OnlineSecurity, and Churn. I chose this dataset because customer churn is a practical business problem, and the dataset supports both exploratory analysis and machine learning classification.



