This programme will cover the following topics:
Module 1: Anti-Patterns
- Rebranding Ops as Reliability Engineering
- Users notice an issue before you do
- Measuring until my Edge
- False positives are worse than no alerts
- Configuration management trap for snowflakes
- The Dogpile: Mob incident response
- Point fixing
- Production Readiness Gatekeeper
- Fail-Safe really?
- Use Case Discussion
Module 2: SLO is a Proxy for Customer Happiness
- Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
- Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Use Case Discussion
Module 3: Building Secure, Scalable and Reliable Systems
- Reliability Engineering and its role in Building Secure and Reliable systems
- Design for Changing Architecture
- Fault tolerant Design
- Design for Security
- Design for Resiliency
- Design for Reliability
- Use Case Discussion
Module 4: Full-Stack Observability
- Modern applications are Complex & Unpredictable
- Slow is the new down
- Pillars of Observability
- Using Open Telemetry
- Use Case Discussion
Module 5: Platform Engineering and AIOPs
- Taking a Platform Centric View
- AIOps: A big data view to go from reactive to proactive to predictive management
- Technology becomes more human through ML, allowing ubiquitous self-service
- Use Case Discussion
Module 6: Incident Response Management
- Key responsibilities towards incident response
- DevOps & ITIL
- OODA and Reliability Incident Response
- Closed Loop Remediation and the Advantages
- Swarming – Food for Thought
- Use Case Discussion
Module 7: DiRT and Chaos Engineering
- Disaster Recovery Testing
- Fault Injection
- Chaos Engineering
- Tools that can be instrumented for Chaos Engineering
- Use Case Discussion
Module 8: Reliability is the Purest form of DevOps
- Key Principles of Reliability Engineering
- How to increase Reliability across the spectrum
- Metrics for Success
- Possible implementation Model
- Cultural and Behavioural Skills are key
- Case Study
- Use Case Discussion