Alliance Micro Solutions | Public Course Catalog

DevOps Institute: Site Reliability Engineering (SRE) Practitioner

Code: DOI SRE Practitio

Duration: 3 Day

$2385 USD

Overview
Delivery Format
Class Schedule
Goals
Outline
Labs
Who Should Attend
Prerequisites

OVERVIEW

Todays organizations deal with a higher volume of change in a more complex tech environment leading to a higher risk of outages and incidents. IT teams must improve service reliability and system resiliency. With automation and observability becoming key factors for more efficient and rapid deployments, the Sight Reliability Engineering (SRE) profile has become one of the fastest-growing enterprise roles and set of operational practices for managing services at scale.

The DevOps Institute SRE Practitioner? course provides a practical view of how to successfully implement a flourishing SRE culture in your organization. This 3-day course is a practical progression for DOI SRE Foundation? certificate holders.

DELIVERY FORMAT

This course is available in the following formats:

Virtual Classroom

Duration: 3 Day

CLASS SCHEDULE

Delivery Format: Virtual Classroom Date: Aug 10 2026 - Aug 12 2026 \| 09:00 - 17:00 EDT Location: Online Course Length: 3 Day	$ 2385
Delivery Format: Virtual Classroom Date: Nov 09 2026 - Nov 11 2026 \| 09:00 - 17:00 EST Location: Online Course Length: 3 Day	$ 2385
Delivery Format: Virtual Classroom Date: Jan 11 2027 - Jan 13 2027 \| 08:30 - 16:30 EST Location: Online Course Length: 3 Day	$ 2385
Delivery Format: Virtual Classroom Date: Mar 15 2027 - Mar 17 2027 \| 08:30 - 16:30 EDT Location: Online Course Length: 3 Day	$ 2385
Delivery Format: Virtual Classroom Date: May 17 2027 - May 19 2027 \| 08:30 - 16:30 EDT Location: Online Course Length: 3 Day	$ 2385

GOALS

Practical view of how to successfully implement a flourishing SRE culture in your organization
The underlying principles of SRE and an understanding of what it is not in terms of antipatterns
Organizational impact of introducing SRE. SLIs and SLOs in a distributed ecosystem and extending the usage of Error Budgets
Building security and resilience by design in a distributed, zero-trust environment
Implementing full-stack observability, distributed tracing and Observability-driven development culture
Curating data using AI to move from reactive to proactive and predictive incident management
Using DataOps to build clean data lineage
Why Platform Engineering is important in building consistency and predictability
Implementing practical Chaos Engineering
Major incident response responsibilities
SRE Execution model

OUTLINE

Notice: Undefined variable: classroom in /home/alliancemicro/public_html/content/catalog/public_course_details.php on line 264

Notice: Trying to access array offset on value of type null in /home/alliancemicro/public_html/content/catalog/public_course_details.php on line 264
Will Be Updated Soon!

Module 1: SRE Anti-Patterns

Break the ice with a recap of DevOps Institutes SRE Blueprint
Discuss how SRE works in a distributed ecosystem
Discuss some of the SRE Barriers
A few SRE Anti-Patterns (discuss the right patterns too)
Discuss the Case Story of how Monzo bank learned from causes leading to SEV1 issue
Case Story: Monzo Bank
Discussion / Exercise: Good versus Bad Postmortem, Describe a Major Incident, Anti-Patterns of SRE

Module 2: SLO is a Proxy for Customer Happiness

What has changed with SLO?
Identifying System boundaries for setting SLIs is critical
How do you use Error Budgets beyond the velocity versus stability debate?
Case Story: Kudos Engineering, Home Depot
Discussion / Exercise: Establishing SLOs in Distributed Ecosystems

Module 3: Building Secure and Reliable Systems

Building Secure and Reliable systems
Non-Abstract Large Scale Design
Designing for the changing Architecture and distributed ecosystem
Fault tolerant Design
Designing for Security
Designing for Resiliency
Case Story: Chrome Security Team
Discussion / Exercise: Non-Abstract Large Scale Design Capacity

Module 4: Full Stack Observability

Modern Apps are Complex & Unpredictable
Slow is the New Down
Pillars of Observability
Using Open Telemetry
Case Story: Planet Labs
Discussion / Exercise: How do you bake Observability in your Code

Module 5: Platform Engineering and AIOps

Taking a Platform Centric View
How do you use AIOps to improve Resiliency
How can DataOps help you in the journey
A simple recipe to implement AIOps
Indicative measurement of AIOps
Case Story: FedEx, 3M
Discussion / Exercise: Instrumenting AIOps using Prometheus

Module 6: SRE and Incident Response Management

SRE Key Responsibilities towards incident response
DevOps & SRE and ITSM (new vs. old ways)
OODA and SRE Incident Response
SRE and CLR (closed loop remediation)
Swarming Food for Thought
AI/ML for better Incident Management
Case Story: HCL AIOps Journey
Discussion / Exercise: Teams to discuss about Swarming and Tier Layered Incident Response framework

Module 7: Chaos Engineering

Navigating Complexity
Chaos Engineering Defined
Quick Facts
Chaos Monkey Origin Story
Who is adopting Chaos Engineering
Myths of Chaos
Chaos Engineering Experiments
GameDay Exercises
Security Chaos Engineering
Chaos Engineering Resources
Discussion / Exercise: Instrumenting Gremlin, Discuss how to conduct a GameDay exercise

Module 8: SRE is the Purest Form of DevOps

Key Principles of SRE
SREs help increase Reliability across the spectrum
Metrics for Success
SRE Execution models
Culture and Behavioral Skills are key
Transformation after implementing SRE practices
Case Story: Airbnb
Discussion / Exercise: Discuss NALSD learnings from Module, Transformation after implementing SRE practices

LABS

Notice: Undefined variable: classroom in /home/alliancemicro/public_html/content/catalog/public_course_details.php on line 289

Notice: Trying to access array offset on value of type null in /home/alliancemicro/public_html/content/catalog/public_course_details.php on line 289
Will Be Updated Soon!

Will Be Updated Soon!

WHO SHOULD ATTEND

IT leaders & managers
Organizational change leaders and agents
SRE engineeers
System Integrators
Business Stakeholders
DevOps Practitioners
System Integrators
Scrum Masters/Product Owners
Software Engineers

PREREQUISITES

It is highly recommended that learners attend the SRE Foundation course and earn the SRE Foundation certification prior to attending the SRE Practitioner course and exam. An understanding and knowledge of common SRE terminology, concepts, principles and related work experience are recommended.

Alliance Micro Solutions provides certified and advanced degree computer instructors and consultants