Cloud Infra_Icon_1500px

Site Reliability Engineering Foundation (SF)

Programme Code D199
Cloud Infrastructure
Learning Partner(s)
NTUC LearningHub
2 Days
Format E-learning
Development Support Logging & Metrics DevOps Methodologies Ops Excellence
Job Roles
ICT&SS Professional DevOps Engineer Cloud Infrastructure Manager Cloud Infrastructure Architect Cloud Infrastructure Engineer


Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The key objectives are to create ultra-scalable and highly reliable distributed software systems.

Introducing a site-reliability dimension requires organisational re-alignment, a new focus on engineering and automation, as well as the adoption of a range of new working paradigms.

This programme is an introduction to the principles and practices that enable an organisation to scale critical services reliably and economically.

Key Takeaways

At the end of this programme, you will be able to understand:

  • the history of SRE and its emergence at Google
  • the inter-relationship of SRE with DevOps and other popular frameworks
  • the underlying principles behind SRE
  • Service Level Objectives (SLO’s) and their user focus
  • Service Level Indicators (SLI’s) and the modern monitoring landscape
  • error budgets and the associated error budget policies
  • toil and its effect on an organisation’s productivity
  • some practical steps that can help to eliminate toil
  • observability as something to indicate the health of a service
  • SRE tools, automation techniques and the importance of security
  • anti-fragility, the approach to failure and failure testing
  • the organisational impact that SRE can bring to an organisation

Who Should Attend

  • Please refer to the job roles section.


  • Prior knowledge of DevOps, which can be achieved by attending: IT14A05 - DevOps Foundation.
  • It is recommended that participants have prior working experience or knowledge in IT software development or IT industry operations.

What To Bring

  • Hardware and Software

This programme will be conducted as a Virtual Live Class (VLC) via Zoom platform. Participants must own a zoom account and have a laptop or a desktop with “Zoom Client for Meetings” installed. This can be downloaded from

Must Have:

Please ensure that your computer or laptop meets the following requirements:

  • operating system: Windows 10 or MacOS (64 bit or above)
  • processor/CPU: 1.8 GHz, 2-core Intel Core i3 or higher
  • minimum 20 GB hard disk space.
  • minimum 8 Gb RAM
  • webcam (The camera must be turn on during the entire duration of the class)
  • microphone
  • internet connection: wired or wireless broadband
  • latest version of Zoom software to be installed on computer or laptop prior to the class

Good to Have:

  • Wired internet connection

    Wired internet will provide you with stable and reliable connection.Dual monitors

Using a dual monitor setup will undoubtedly improve your training experience, enabling you to simultaneously participate in hands-on exercises and maintain engagement with your instructor.

This programme will cover the following topics:

Module 1: SRE Principles and Practices

  • What is Site Reliability Engineering?
  • SRE and DevOps: What are the Differences?
  • SRE Principles and Practices

Module 2: Service Level Objectives and Error Budgets

  • Service Level Objectives (SLO’s)
  • Error Budgets and Error Budget Policies

Module 3: Reducing Toil

  • What is Toil?
  • Why is Toil Bad?
  • Doing Something About Toil

Module 4: Monitoring & Service Level Indicators

  • Service Level Indicators (SLI’s)
  • Monitoring and Observability

Module 5: SRE Tools and Automation

  • Automation Focus
  • Hierarchy of Automation Types
  • Secure Automation
  • Automation Tools

Module 6: Anti-Fragility and Learning from Failure

  • Why Learn from Failure?
  • Benefits of Anti-Fragility
  • Shifting the Organizational Balance

Module 7: Organizational Impact of SRE

  • Why Organisations Embrace SRE?
  • Patterns for SRE Adoption
  • Sustainable Incident Response
  • Blameless Post-Mortems
  • SRE and Scale

Module 8: SRE, Other Frameworks and Trends

  • SRE and Other Frameworks
  • SRE Evolution
  • Additional Sources of Information

Certificate Obtained and Conferred By:

  • Certificate of Completion from NTUC LearningHub 

Upon meeting 75% attendance and passing the assessment(s), participants will receive a Certificate of Completion from NTUC LearningHub.

  • Statement of Attainment from SkillsFuture Singapore

Upon meeting at least 75% attendance and passing the assessment(s), participants will receive a Statement of Attainment from SkillsFuture Singapore to certify that the participant has achieved the following Competency Standard(s): Quality Engineering (ICT-DIT-3011-1.1)

External Certification Exam:

After registration, you will receive a DevOps exam voucher three days before the date of programme commencement from NTUC LearningHub. After completing the course with 75% attendance achieved, you can proceed to register and sit for the official “DevOps Site Reliability Engineering Foundation” exam on DevOps Institute online portal. You must complete the exam within the validity date of the exam voucher.

DevOps Site Reliability Engineering Foundation Exam Details
Number of Questions: 40
Question Format: Multiple-choice
Exam Duration: 60 minutes
Passing Score: 26 out of 40 (65%)

After completing this programme with at least 75% attendance and upon passing the official “DevOps Site Reliability Engineering Foundation” certification exam, you will receive a Certified Site Reliability Engineering Foundation certification from DevOps Institute. The certification is governed and maintained by DevOps Institute.

Full Fee

Full programme fee


9% GST on nett programme fee


Total nett programme fee payable, including GST S$1526
With effect from 1 Jan 2024

Funding is available for this programme. Please visit the learning partner’s website to find out about the updated programme fee funding breakdown and eligibility.

Prices are subject to other NTUC LearningHub miscellaneous fees.

Upcoming Classes

Class 1
09 May 2024 to 10 May 2024 (Full Time)
Duration: 2 days
When: May - 09, 10
Time : 9.00am - 6.00pm


Step 1 Apply through your organisation's training request system.

Step 2 Your organisation's training request system (or relevant HR staff) confirms your organisation's approval for you to take the programme.

Your organisation will send registration information to the academy.

Organisation HR L&D or equivalent staff can click here for details of the registration submission process.

Step 3 GovTech Digital Academy will inform you whether you have been successful in enrolment.