ML Engineer 2026

Telecom Churn Predictor

Binary churn classifier built with scikit-learn (Random Forest, Logistic Regression) and Keras (deep neural network), evaluated against a real-world telecoms dataset, and deployed as a SageMaker real-time endpoint. Includes full model comparison pipeline and Jupyter exploratory analysis.

Pythonscikit-learnKerasSageMakerS3Jupyter

Binary churn classifier for telecoms customer data. Three models evaluated head-to-head — Logistic Regression, Random Forest, and a Keras deep neural network — with the best performer deployed to a SageMaker real-time endpoint.

What it does

  • Ingests a 7,000-record telecoms dataset with 20 features (contract type, tenure, charges, service add-ons)
  • Trains and evaluates three classifiers, selecting the Random Forest (ROC-AUC 0.93) for deployment
  • Deploys to SageMaker as a real-time inference endpoint
  • 30 unit tests covering preprocessing, model evaluation, and drift detection (TDD)

Technical highlights

  • Decimal(str(value)) pattern for DynamoDB numeric compatibility
  • Feature engineering: log-transform on skewed charge columns, one-hot encoding for categoricals
  • Model card with accuracy, ROC-AUC, precision, recall per class
  • Drift detection module using KS test (numeric) and chi-squared (categorical)