Back to Full Curriculum
ML-EL2Semester 74 (3-0-2)Elective

MLOps & Production AI

ML project lifecycle stages (data, training, validation, deployment, monitoring), MLOps maturity levels (manual, ML automation, continuous ML), Model drift types (concept drift, data drift, upstream drift), Golden ML...

Syllabus

01

Unit 1: MLOps Fundamentals and ML Lifecycle

ML project lifecycle stages (data, training, validation, deployment, monitoring), MLOps maturity levels (manual, ML automation, continuous ML), Model drift types (concept drift, data drift, upstream drift), Golden ML pipelines, Experiment tracking and reproducibility, Versioning strategies (data, models, code, environment), ML metadata stores and lineage tracking.

02

Unit 2: Data Management and Feature Engineering

Data versioning (DVC, Delta Lake), Data quality gates and schema validation, Feature stores (Feast, Tecton, Hopsworks), Online/offline feature serving, Feature drift detection, Data lineage and impact analysis, Automated data validation (Great Expectations, Deequ), PII/PHI data anonymization and compliance.

03

Unit 3: Model Development and Experimentation

MLflow tracking server and experiment management, Weights & Biases (W&B) sweeps and hyperparameter optimization, Ray Tune distributed tuning, Model registries and staging environments, A/B testing frameworks (Optimizely, Flagr), Canary deployments and shadow testing, Model performance baselines and champion/challenger patterns.

04

Unit 4: Model Deployment and Serving

Model packaging (ONNX, TorchServe, TensorFlow Serving), Containerization best practices (Docker multi-stage builds), Kubernetes inference servers (KFServing, Seldon Core, KServe), Serverless ML (AWS SageMaker Serverless, Cloud Run), Edge deployment (TensorFlow Lite, ONNX Runtime), Multi-model serving and GPU sharing, Batch vs. real-time inference.

05

Unit 5: ML Monitoring, Governance, and Continuous Training

Model monitoring (KServe model monitoring, Prometheus integration), Drift detection algorithms (KS2 test, PSI, MMD), Automated retraining triggers and CI/CD for ML, Human-in-the-loop validation, Model explainability (SHAP, LIME integration), Bias/fairness monitoring (AIF360, Fairlearn), Compliance frameworks (GDPR, HIPAA ML controls), Blue-green ML deployments.