What is an SRE? How To Land an SRE Role Today

#sre #job #developer #programming

What is SRE?

Site Reliability Engineering (SRE) is a relatively new term in the software industry. It is a software engineering approach designed for improved system management and problem-solving. Think of it as a new form of system administration.

In SRE, a software engineer is in charge of tasks that are usually performed by the operations team. Site reliability engineering involves ensuring the availability, latency, performance, capacity, scalability, and deployment of software systems by the engineers themselves.

In this approach, the software meets operations. Companies using SRE hire people with software development experience in order to solve infrastructure and operational problems.

What Is Expected From a Site Reliability Engineer?

A site reliability engineer excels at the production side of the software. They are expected to ensure that software is delivered and deployed flawlessly. Additionally, SREs are responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning

The SRE model hinges on effective standardization and automation. Engineers are tasked with ideating and implementing methods to enhance and automate operational tasks, thus streamlining development and deployment processes.

Like system administrators, SREs must have some software development experience, but their primary strengths are network engineering, troubleshooting, deployment, configurations. They must also be effective multitaskers, as they must ensure multiple system components collaborate and deliver results consistently.

For greater clarity, let’s look at the average day of a site reliability engineer:

Attending calls to fix/build deployment infrastructure.
Ensuring that binaries and configurations are reproducible and applicable for integration deployment environments to ensure maximum system availability.
Managing configurations of cloud resources for automated deployments.
Monitoring software infrastructure, tracking tickets, and checking logs to mitigate risks and resolve existing problems.
Plan releases and future deployments.
Participate in sprint planning, code review, code development, and architecture design to foresee risks and offer input on best practices.
Plan software deployments with immutable infrastructure using CI/CD.

Bear in mind that due to its relatively recent origin, the SRE role is highly subjective when it comes to specific responsibilities. At some companies, SREs play a key role in software development and programming, while at others they might be expected to focus specifically on the operations side.