DEV Community

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Configure an Intuitive Service Dashboard & Reduce Response Time

Configure an Intuitive Service Dashboard & Reduce Response Time

Comments
3 min read
Suppressing Alert Noise during Scheduled Maintenance

Suppressing Alert Noise during Scheduled Maintenance

Comments
3 min read
Hiteshwar shares his thoughts on being an SRE

Hiteshwar shares his thoughts on being an SRE

Comments
4 min read
Simple Log Monitors Using monitro.dev

Simple Log Monitors Using monitro.dev

Comments 3
1 min read
Journey of Streamlining Oncall and Incident Management

Journey of Streamlining Oncall and Incident Management

Comments
10 min read
The Importance of Using Granted for Managing Multiple AWS Accounts

The Importance of Using Granted for Managing Multiple AWS Accounts

Comments
2 min read
OTEL Demo with EKS and New Relic

OTEL Demo with EKS and New Relic

6
Comments
4 min read
O básico de mirror do Istio

O básico de mirror do Istio

2
Comments 1
5 min read
Top 5 BetterStack Alternatives For Status Page In 2024

Top 5 BetterStack Alternatives For Status Page In 2024

Comments
4 min read
Terraform Dynamic Blocks: Advanced Use Cases and Examples

Terraform Dynamic Blocks: Advanced Use Cases and Examples

5
Comments
9 min read
Demystifying Service Level acronyms and Error Budgets

Demystifying Service Level acronyms and Error Budgets

Comments
9 min read
How to easily start Backstage

How to easily start Backstage

1
Comments
3 min read
From your source code to zero-downtime, high availability, and secure production deployment in no time

From your source code to zero-downtime, high availability, and secure production deployment in no time

1
Comments
1 min read
Como evitar problemas de "Zabbix poller processes more than 75% busy"

Como evitar problemas de "Zabbix poller processes more than 75% busy"

2
Comments
2 min read
Virtualization - The Basics

Virtualization - The Basics

3
Comments 3
3 min read
AWS: Your Ally in Amplifying Reliability with GenAI

AWS: Your Ally in Amplifying Reliability with GenAI

4
Comments
5 min read
AWS Cost Optimization: Periodic Deletion of ECR Container Images

AWS Cost Optimization: Periodic Deletion of ECR Container Images

7
Comments
5 min read
How to transfer forked repository which original is private in GitHub

How to transfer forked repository which original is private in GitHub

Comments
2 min read
On-Call Cookbook

On-Call Cookbook

1
Comments 1
3 min read
One Year of DevOps at Idus: Reflections and Learnings

One Year of DevOps at Idus: Reflections and Learnings

Comments
4 min read
AWS Cert Manager integration with Prometheus with Domain Name

AWS Cert Manager integration with Prometheus with Domain Name

2
Comments
3 min read
How to Release Service

How to Release Service

Comments
2 min read
“Automating VPC Peering in AWS with Terraform”

“Automating VPC Peering in AWS with Terraform”

Comments
3 min read
How to delete all AWS resources using aws-nuke

How to delete all AWS resources using aws-nuke

1
Comments
2 min read
What are SLI, SLO and SLA, and Why are they important in SRE?

What are SLI, SLO and SLA, and Why are they important in SRE?

Comments
3 min read
Kubernetest (on-prem) master node and worker node associations.

Kubernetest (on-prem) master node and worker node associations.

Comments
1 min read
SQLServer service status monitoring on Windows with Prometheu.

SQLServer service status monitoring on Windows with Prometheu.

Comments
1 min read
Amazon Forecast : Best Practices and Anti-Patterns implementing AIOps

Amazon Forecast : Best Practices and Anti-Patterns implementing AIOps

6
Comments
4 min read
Definindo SLO - "Let Go!"

Definindo SLO - "Let Go!"

2
Comments
2 min read
Executing bash script commands in a sub-shell to manage status code and output

Executing bash script commands in a sub-shell to manage status code and output

1
Comments
2 min read
Kubectl Port-forward Flow Explained

Kubectl Port-forward Flow Explained

Comments
3 min read
Networking 101: Back to School

Networking 101: Back to School

4
Comments 1
6 min read
SRE vs DevOps vs SysAdmin

SRE vs DevOps vs SysAdmin

1
Comments 1
3 min read
Roles and Responsibilities Matrix

Roles and Responsibilities Matrix

Comments
5 min read
Matriz de Papéis e Responsabilidades

Matriz de Papéis e Responsabilidades

1
Comments
6 min read
On The Importance of End-to-End Monitoring for IoT

On The Importance of End-to-End Monitoring for IoT

2
Comments
2 min read
LLMs in Amazon Bedrock: Observability Maturity Model

LLMs in Amazon Bedrock: Observability Maturity Model

7
Comments
7 min read
DevOps and SRE: A Collaborative Journey Towards Reliable Software Delivery

DevOps and SRE: A Collaborative Journey Towards Reliable Software Delivery

Comments
4 min read
Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana

6
Comments
8 min read
How to send Alerts and Notifications with Telegram

How to send Alerts and Notifications with Telegram

Comments
3 min read
2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

2024 Site Reliability Engineering: Key Trends and Focus Areas for SREs

Comments
7 min read
Inside the Kubernetes Control Plane

Inside the Kubernetes Control Plane

15
Comments 2
5 min read
Expand your root EBS Volume attached to your Windows EC2

Expand your root EBS Volume attached to your Windows EC2

Comments
2 min read
Reciprocity, Companion Planting & DevSecOps

Reciprocity, Companion Planting & DevSecOps

1
Comments
3 min read
ARM vs x86 em Docker

ARM vs x86 em Docker

2
Comments
6 min read
Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Effortless Database Scaling: Migrate from RDS to Aurora Serverless V2

Comments
2 min read
Why Should Devops/SRE learn Golang?

Why Should Devops/SRE learn Golang?

Comments
4 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

Comments
4 min read
Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

Kubernetes Debugging: Handling Multiple kubectl port-forward from Tray

2
Comments
6 min read
Observability Maturity Model for AWS

Observability Maturity Model for AWS

5
Comments
3 min read
Reliability in Legacy Software

Reliability in Legacy Software

1
Comments
3 min read
Por que os times precisam de SLOs, SLIs e Error Budget?

Por que os times precisam de SLOs, SLIs e Error Budget?

6
Comments
4 min read
Netdata vs Prometheus: Performance Analysis

Netdata vs Prometheus: Performance Analysis

Comments
12 min read
Smart Chaos: LLMs, No More Human Modeling

Smart Chaos: LLMs, No More Human Modeling

5
Comments
6 min read
Instalando Kubernetes do Zero

Instalando Kubernetes do Zero

Comments
11 min read
Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

Exploring the World of LLMs for SRE Powered by PartyRock (Claude, Jurassic-2, Titan, Command, Liama 2 & Stable Diffusion XL)

6
Comments
7 min read
#DevOps para noobs - Proxy Reverso

#DevOps para noobs - Proxy Reverso

199
Comments 12
3 min read
How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

How to quickly realize proactive patrolling for dead-end network connectivity in large-scale clusters

Comments
6 min read
大规模集群下,如何快速实现无死角网络连通性的主动巡检

大规模集群下,如何快速实现无死角网络连通性的主动巡检

Comments
2 min read
Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

Discovering the Magic of Service Mesh: Navigating the Microservices Maze 🌐🕸️🕵️‍♂️

8
Comments
3 min read
loading...