Back to Full Curriculum
SE-EL1Semester 74 (3-0-2)Elective

DevOps & Site Reliability Engineering

DevOps cultural principles (CAMS - Culture, Automation, Measurement, Sharing), CALMS framework extension, Continuous Integration practices (trunk-based development, feature flags), Continuous Delivery/Deployment pipel...

Syllabus

01

Unit 1: DevOps Philosophy and CI/CD Pipelines

DevOps cultural principles (CAMS - Culture, Automation, Measurement, Sharing), CALMS framework extension, Continuous Integration practices (trunk-based development, feature flags), Continuous Delivery/Deployment pipelines, GitOps declarative deployments, Pipeline as Code (Jenkinsfile, GitHub Actions, GitLab CI), Artifact management and immutable infrastructure.

02

Unit 2: Infrastructure as Code and Configuration Management

Terraform workflow (init, plan, apply, destroy), HCL syntax and providers, State management (remote backends, locking), Ansible for configuration management (playbooks, roles, inventories), Puppet/Chef declarative models, Kubernetes as Infrastructure-as-Code (Helm charts, Kustomize), Desired State Configuration (DSC).

03

Unit 3: Site Reliability Engineering Principles

SRE golden signals (latency, traffic, errors, saturation), Service Level Indicators/Objectives/Agreements (SLI/SLO/SLA), Error budgets and toil reduction, Capacity planning and load testing, Incident management (postmortem culture, blameless retrospectives), On-call engineering and incident response playbooks, Reliability engineering vs. traditional ops.

04

Unit 4: Observability and Monitoring Stack

The three pillars (metrics, logs, traces), Prometheus time-series monitoring (federation, alerting), Grafana dashboards and anomaly detection, Loki for log aggregation, Jaeger/Zipkin distributed tracing, OpenTelemetry instrumentation, Synthetic monitoring and chaos engineering (Chaos Mesh, LitmusChaos), SLO-based alerting.

05

Unit 5: Production Excellence and Chaos Engineering

Progressive delivery (feature flags, canary releases, blue-green deployments), A/B testing infrastructure, Multi-region failover and disaster recovery, GitOps with ArgoCD/Flux, Chaos engineering experiments (fault injection, resilience testing), Production readiness reviews (PRR), FinOps and cost optimization at scale.