The implementation of SRE and DevOps practices can have distinct effects on an organization’s structure and culture, particularly in the realm of business. Therefore, comprehending the differences between SRE vs DevOps is crucial for embarking on the right path in software engineering. This understanding is vital to avoid errors and effectively navigate the realm of a software engineer.
From my experience, most of the confusion on the subject comes from not understanding each approach’s origin and core principles and how they impact customers. Too often, all we remember is all the tools and ecosystems built around DevOps and Site Reliability Engineering (SRE).
Let’s start by clearing up common confusions and diving into the heart of these approaches. Explore the history, objectives, and roles of SRE vs DevOps, as well as the essential techniques and tools of best practices.
A Short History of DevOps and SRE
From 2001 to 2008, Agile Development was in everyone’s head, and the adoption was gaining traction. However, the current state of IT operations acts as a bottleneck in the software delivery life cycle. Even though the team works in small increments, release can take weeks or months.
In 2008, at the Agile Conference (Toronto), Patrick Dubois presented Agile Infrastructure and Operations. Two key ideas to solve the Agile stalemate with the operations: often deploy and create Cross-Functional teams. This is the birth of DevOps, and those two ideas are still core values.
DevOps was still more of a conference topic at that time. But this was about to change; in 2010, David Farley and Jez Humble published Continuous Delivery setting the tone for how and what a Cross-Functional DevOps team should do to deploy as often as possible. Hence starting, the golden age of DevOps.
SRE, at its core, introduced a new level of expertise and methodology to the field of system management. It preceded DevOps by over 5 years (created in 2003 at Google) and aimed at building a framework to manage large-scale systems that needed an additional best practice to ensure reliability and high availability. The incorporation of SRE brought a heightened level of focus on maintaining service levels and implementing robust infrastructure.
From 2003 to 2016, this approach was mainly adopted by large companies, such as Google, Facebook, Uber, and Netflix. The idea was to create a team situated between development and operations (in the middle) to ensure that systems were always accessible (availability) and failure-proof (reliability). SRE is both a team and the set of tools and practices that the team uses to achieve its goals.
DevOps and SRE evolved almost in parallel; some would have had to argue which is better. That is up until 2016 when Google engineers published the book SRE, which states:
“Class SRE implements interface DevOps.”
Since then, both disciplines have continued to grow and intersect, with practitioners constantly seeking ways to enhance their processes and achieve continuous improvement.
In short, SRE is DevOps, specifically a variant of DevOps. Another notable book that year is The DevOps Handbook which further refines the key concept of DevOps that we will discuss in the next section.
I recommend reading Team Topologies to know all the possible implementations of DevOps. SRE is only one of the many options and may not be best suited to your organization.
What are the goals and focus of DevOps and SRE?
Let’s recap our history lesson. DevOps is a mindset and culture that values collaboration (multidisciplinary team) to accelerate the pace of deployments to production.
The DevOps collaboration resulted in practices such as continuous delivery and the associated tools chain(s) that permit fast (endless?) loops of deployment.
SRE is an operational framework, which means a set of practices to ensure the availability and reliability of systems at scale. In retrospect, we have realized that SRE is ultimately an opinionated implementation of DevOps.
SRE’s ultimate goal is to ensure the best user experience; high availability, scalability, and reliability are proxies to measure customer experience. SRE brings stakeholders, devs, and ops to the table and asks what metrics reflect good user experience (you may have heard of SLOs/SLIs/SLAs). They collaborate with those same three actors (stakeholders, devs, and ops) to achieve the set targets (99.99% availability, maybe).
Both approaches share the same goal: to bridge the gap between development and operation. However, in essence, their focus differs a lot:
- DevOps: Continuous delivery and delivery speed
- SRE: Availability and reliability
Below is an infographic that recaps all you need to know:
The Role(s) of DevOps and SRE
Let’s address the elephant in the room. DevOps is not a role nor a team! While SRE refers to both a role and a team of engineers assuming that role.
In DevOps, one of the key principles is the value placed on Cross-Functional teams. Unlike traditional approaches, DevOps does not rigidly define roles. Instead, it acknowledges that the specific roles within a DevOps team can vary based on factors such as company size and context, but generally, you will find those roles:
- More Dev: Frontend Engineers, backend Engineers, and Quality Assurance (QA) roles. Those are the historical Dev roles.
- More Ops: Cloud Engineers, Database Administrators (DBA), and Network Engineers. Those are roles historically more Ops, but tend to disappear in smaller companies due to SaaS and IaaS offerings.
- In the middle: those are new roles born with the idea of gluing Dev and Ops together. Most of those overlap as they are born from different visions of what makes a good DevOps ecosystem.
- Automation Engineers: with a strong focus on optimizing pipelines and application life cycles, they strive to automate every process as much as possible.
- Developer Experience (DevXP): works on developer tooling and automation to simplify a software developer’s life.
- Platform Engineers: works on building abstractions (platform) between the development and operation.
- Site reliability engineers: spend an average time of 50% doing Ops work such as incident resolution, on call, and manual interventions. The remaining 50% of software development tasks to increase system resilience, scaling, and automation.
Notable Techniques in DevOps and SRE
In this section, you will find articles referring to the best practices of each approach, including the deployment process. Now, you can delve into practices and tools but do not make the mistake of forgetting the origin of DevOps and SRE, nor their culture and focus.
Key Techniques in DevOps
Key Techniques in SRE
Recap of key points
The two concepts DevOps vs SRE evolved in parallel, but the crucial realization is that Site Reliability Engineering (SRE) is an implementation of DevOps; driven by a shared goal of bridging the gap between development and operations teams, creating a more collaborative and efficient working environment.
Essentially, the distinction between SRE vs DevOps lies in their focus. DevOps emphasizes fast delivery, enabling organizations to iterate and release software quickly. On the other hand, SRE prioritizes availability, ensuring that systems are always up and running smoothly. An important aspect of SRE is the use of indicators to monitor and assess system performance and reliability. These indicators help SRE teams identify potential issues and take proactive measures to maintain high availability and stability.
It is important to note that DevOps is not a standalone role, but rather one of the uses of cross-functional teams. Sometimes, individuals mistakenly refer to themselves as “DevOps” when they are, in fact, DevOps evangelists or tech specialists who have transitioned away from traditional operations roles after the breakdown of silos.
SRE, conversely, is a concrete role clearly defined in the framework of the same name. In the next, article, we will dive deep into the topic of SRE with the best practices and responsibilities associated with that role.
Feel free to contact us to learn more about implementing DevOps and SRE practices. Join Gologic today to optimize your time and production!
By Gologic with the collaboration of Alexandre Couëdelo.
Does SRE involve programming?
Yes, the role of SRE (Site Reliability Engineer) often involves programming. SREs use programming skills to automate repetitive tasks, create tools and scripts to monitor and manage systems and resolve reliability issues. Programming is therefore a key skill for SREs.
What are the best practices for integrating SRE or DevOps principles into an existing project?
To integrate SRE vs DevOps principles into an existing project, it is recommended to follow these steps: assess the existing infrastructure and processes, foster collaboration between development and operations teams, and automate repetitive tasks. Using infrastructure as code (IaC) and implementing continuous integration (CI) and continuous deployment (CD) are also essential practices for improving project efficiency and reliability.
What tools and technologies are commonly used by SRE and DevOps teams?
SRE and DevOps teams commonly use tools such as Kubernetes, Docker, and Ansible for infrastructure management. They also utilize monitoring tools like Prometheus, ELK, and Grafana, as well as CI/CD pipeline automation tools like Jenkins, GitLab CI/CD, or Travis CI. Finally, configuration management tools like Puppet, Chef, or Ansible are also used.