Skip to content
Navigation menu
Search
Powered by
Search
Algolia
Search
Log in
Create account
DEV Community
Close
Site Reliability Engineering
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Retry Pattern: Manejando Fallos Transitorios en Sistemas Distribuidos
diek
diek
diek
Follow
Nov 13
Retry Pattern: Manejando Fallos Transitorios en Sistemas Distribuidos
#
spanish
#
devops
#
sre
#
cloud
Comments
Add Comment
3 min read
Retry Pattern: Handling Transient Failures in Distributed Systems
diek
diek
diek
Follow
Nov 13
Retry Pattern: Handling Transient Failures in Distributed Systems
#
devops
#
sre
#
cloud
Comments
Add Comment
3 min read
[pt-BR] Postmortem: A Importância de uma Análise Estruturada de Incidentes em SRE
Marcos Vilela
Marcos Vilela
Marcos Vilela
Follow
Nov 11
[pt-BR] Postmortem: A Importância de uma Análise Estruturada de Incidentes em SRE
#
sre
#
incident
#
serverfault
#
infra
Comments
Add Comment
4 min read
Rely.io October 2024 Product Update Roundup
Tiago Barbosa
Tiago Barbosa
Tiago Barbosa
Follow
for
Rely.io
Nov 7
Rely.io October 2024 Product Update Roundup
#
devops
#
sre
#
platformengineering
#
idp
Comments
Add Comment
4 min read
Internal Developer Portals: Autonomy, Governance and the Golden Path
Tiago Barbosa
Tiago Barbosa
Tiago Barbosa
Follow
for
Rely.io
Oct 31
Internal Developer Portals: Autonomy, Governance and the Golden Path
#
sre
#
idp
#
platformengineering
#
developers
1
reaction
Comments
Add Comment
15 min read
SRE Culture Embedding Reliability into Engineering Teams
kubeha
kubeha
kubeha
Follow
Oct 23
SRE Culture Embedding Reliability into Engineering Teams
#
sre
#
reliability
#
monitoring
#
automation
Comments
Add Comment
3 min read
Procedimentos como base sólida da experiência do desenvolvedor antes da automação
Roberson Miguel
Roberson Miguel
Roberson Miguel
Follow
Nov 11
Procedimentos como base sólida da experiência do desenvolvedor antes da automação
#
devops
#
sre
#
devrel
#
beginners
6
reactions
Comments
Add Comment
2 min read
SRE Deployment Engineer Managing Reliable & Automated Deployments
kubeha
kubeha
kubeha
Follow
Nov 11
SRE Deployment Engineer Managing Reliable & Automated Deployments
#
sre
#
automation
#
sredeployment
#
engineer
3
reactions
Comments
Add Comment
4 min read
7 Kubernetes Security Best Practices in 2024
Shubham
Shubham
Shubham
Follow
Oct 29
7 Kubernetes Security Best Practices in 2024
#
devops
#
devsecops
#
sre
#
kubernetes
5
reactions
Comments
Add Comment
3 min read
SRE vs DevOps: What’s the Difference and Why Does It Matter? 🤓
ClickIT - DevOps and Software Development
ClickIT - DevOps and Software Development
ClickIT - DevOps and Software Development
Follow
Oct 15
SRE vs DevOps: What’s the Difference and Why Does It Matter? 🤓
#
sre
#
devops
Comments
Add Comment
1 min read
Rely.io September 2024 Product Update Roundup
Tiago Barbosa
Tiago Barbosa
Tiago Barbosa
Follow
for
Rely.io
Oct 15
Rely.io September 2024 Product Update Roundup
#
devops
#
sre
#
platformengineering
#
idp
1
reaction
Comments
Add Comment
4 min read
Best Practices for Choosing a Status Page Provider
Hrish B
Hrish B
Hrish B
Follow
for
IncidentHub
Oct 15
Best Practices for Choosing a Status Page Provider
#
statuspage
#
sre
#
devops
#
monitoring
Comments
Add Comment
5 min read
Why would I use this instead of Traefik for zero-downtime deployment?
Andrew Kang-G
Andrew Kang-G
Andrew Kang-G
Follow
Nov 16
Why would I use this instead of Traefik for zero-downtime deployment?
#
sre
#
docker
#
laravel
#
springboot
3
reactions
Comments
Add Comment
6 min read
Designing a fault-tolerant etcd cluster
Michael Mekuleyi
Michael Mekuleyi
Michael Mekuleyi
Follow
Nov 4
Designing a fault-tolerant etcd cluster
#
devops
#
kubernetes
#
sre
#
sitereliabilityengineering
7
reactions
Comments
1
comment
5 min read
🚀 Day 8: Mastering Shell Scripting in DevOps | Bash Challenge
Kanavsingh
Kanavsingh
Kanavsingh
Follow
Nov 14
🚀 Day 8: Mastering Shell Scripting in DevOps | Bash Challenge
#
challenge
#
bash
#
sre
#
devops
10
reactions
Comments
1
comment
2 min read
[pt-BR] Como expandi o armazenamento da minha pasta /home com Block Storage
Marcos Vilela
Marcos Vilela
Marcos Vilela
Follow
Nov 6
[pt-BR] Como expandi o armazenamento da minha pasta /home com Block Storage
#
beginners
#
linux
#
blockstorage
#
sre
Comments
Add Comment
4 min read
How to Set up Disk and Bandwidth Limits in Docker
Shubham
Shubham
Shubham
Follow
Oct 25
How to Set up Disk and Bandwidth Limits in Docker
#
devops
#
docker
#
sre
#
kubernetes
3
reactions
Comments
Add Comment
2 min read
K8s Plugins For Solid Security
Shubham
Shubham
Shubham
Follow
Oct 25
K8s Plugins For Solid Security
#
kubernetes
#
devops
#
sre
#
security
Comments
Add Comment
2 min read
What are Kata Containers?
Shubham
Shubham
Shubham
Follow
Oct 25
What are Kata Containers?
#
containers
#
devops
#
kubernetes
#
sre
Comments
Add Comment
2 min read
Zero-Downtime Blue-Green Deployment with a Simple 'git pull & bash run.sh' Command
Andrew Kang-G
Andrew Kang-G
Andrew Kang-G
Follow
Nov 4
Zero-Downtime Blue-Green Deployment with a Simple 'git pull & bash run.sh' Command
#
cicd
#
sre
#
webdev
#
docker
1
reaction
Comments
Add Comment
1 min read
DynamoDB: Query x Scan! Para de torrar dinheiro usando Scan em produção
Camila Figueira
Camila Figueira
Camila Figueira
Follow
Oct 28
DynamoDB: Query x Scan! Para de torrar dinheiro usando Scan em produção
#
aws
#
sre
#
braziliandevs
#
database
38
reactions
Comments
6
comments
4 min read
How to Fix Kubernetes Node Disk Pressure
Shubham
Shubham
Shubham
Follow
Oct 28
How to Fix Kubernetes Node Disk Pressure
#
help
#
devops
#
sre
#
kubernetes
Comments
Add Comment
2 min read
Some of the less-known ping types you should know
Shubham
Shubham
Shubham
Follow
Oct 25
Some of the less-known ping types you should know
#
linux
#
devops
#
sre
6
reactions
Comments
1
comment
1 min read
How a Pod is Deleted - Behind the Scenes Breakdown
Shubham
Shubham
Shubham
Follow
Oct 25
How a Pod is Deleted - Behind the Scenes Breakdown
#
k8s
#
kubernetes
#
devops
#
sre
8
reactions
Comments
2
comments
2 min read
How To Fix OOMKilled
Shubham
Shubham
Shubham
Follow
Oct 25
How To Fix OOMKilled
#
help
#
kubernetes
#
devops
#
sre
1
reaction
Comments
Add Comment
2 min read
Creating an Efficient IT Incident Management Plan: A Guide to Templates and Best Practices
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 19
Creating an Efficient IT Incident Management Plan: A Guide to Templates and Best Practices
#
incidentmanagement
#
sre
Comments
Add Comment
7 min read
The “R” in MTTR: Repair or Recover? What’s the difference?
Karina Babcock
Karina Babcock
Karina Babcock
Follow
for
Causely
Sep 18
The “R” in MTTR: Repair or Recover? What’s the difference?
#
devops
#
cloudnative
#
sre
Comments
Add Comment
5 min read
SLOs and Customer Experience: Uniting Engineering Excellence with Customer Satisfaction
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 19
SLOs and Customer Experience: Uniting Engineering Excellence with Customer Satisfaction
#
devops
#
sre
Comments
Add Comment
5 min read
SRE and the Enterprise: Building a Culture of Reliability at Scale
Squadcast.com
Squadcast.com
Squadcast.com
Follow
Sep 17
SRE and the Enterprise: Building a Culture of Reliability at Scale
#
sre
Comments
Add Comment
4 min read
DevOps vs. SRE Understanding the Differences and Benefits
kubeha
kubeha
kubeha
Follow
Sep 10
DevOps vs. SRE Understanding the Differences and Benefits
#
devops
#
sre
#
priniciples
#
difference
Comments
Add Comment
2 min read
How to Define Engineering Standards (with Backstage)
Sam Nixon
Sam Nixon
Sam Nixon
Follow
Sep 28
How to Define Engineering Standards (with Backstage)
#
sre
#
backstage
Comments
Add Comment
10 min read
The Pillars of Site Reliability Engineering Building Resilient Systems
kubeha
kubeha
kubeha
Follow
Sep 5
The Pillars of Site Reliability Engineering Building Resilient Systems
#
automation
#
sre
#
monitoring
#
budget
Comments
Add Comment
2 min read
Introducing Botkube Fuse: The Platform Engineer’s Copilot
Kubeshop
Kubeshop
Kubeshop
Follow
for
Kubeshop
Sep 3
Introducing Botkube Fuse: The Platform Engineer’s Copilot
#
devops
#
productivity
#
git
#
sre
6
reactions
Comments
Add Comment
4 min read
DevOps
Shivam Vishwakarma
Shivam Vishwakarma
Shivam Vishwakarma
Follow
Sep 12
DevOps
#
devops
#
cloud
#
docker
#
sre
1
reaction
Comments
Add Comment
1 min read
Accelerating Business Growth with a Platform Engineering Team
Pablo Santos
Pablo Santos
Pablo Santos
Follow
Aug 29
Accelerating Business Growth with a Platform Engineering Team
#
devops
#
sre
#
softwaredevelopment
Comments
Add Comment
5 min read
When Alerts Don’t Mean Downtime - Preventing SRE Fatigue
Hrish B
Hrish B
Hrish B
Follow
for
IncidentHub
Sep 12
When Alerts Don’t Mean Downtime - Preventing SRE Fatigue
#
devops
#
sre
#
monitoring
#
incidentresponse
Comments
Add Comment
2 min read
System Reliability Metrics: A Comparative Guide to MTTR, MTBF, MTTD, and MTTF
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 2
System Reliability Metrics: A Comparative Guide to MTTR, MTBF, MTTD, and MTTF
#
incidentmanagement
#
sre
Comments
Add Comment
10 min read
The Pulse Of Technology: Why IT Monitoring Is Non-Negotiable In 2024
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 2
The Pulse Of Technology: Why IT Monitoring Is Non-Negotiable In 2024
#
monitoring
#
sre
#
bestpractices
Comments
Add Comment
13 min read
How to improve DORA metrics as a release engineer
Ibrahim Salami
Ibrahim Salami
Ibrahim Salami
Follow
for
Aviator
Oct 1
How to improve DORA metrics as a release engineer
#
devops
#
sre
#
productivity
5
reactions
Comments
Add Comment
10 min read
𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗥𝗼𝗹𝗲 𝗼𝗳 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴
Gabriel Akinmoyero
Gabriel Akinmoyero
Gabriel Akinmoyero
Follow
Sep 20
𝗧𝗵𝗲 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗥𝗼𝗹𝗲 𝗼𝗳 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴
#
devops
#
monitoring
#
sre
#
cloud
1
reaction
Comments
Add Comment
1 min read
SRE and the Enterprise: Building a Culture of Reliability at Scale
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 17
SRE and the Enterprise: Building a Culture of Reliability at Scale
#
sre
Comments
Add Comment
4 min read
Understanding the 0.6-Second Detection Time for Full Outages
Mohammed Ammer
Mohammed Ammer
Mohammed Ammer
Follow
Sep 14
Understanding the 0.6-Second Detection Time for Full Outages
#
sre
#
alerting
#
monitoring
#
metrics
8
reactions
Comments
Add Comment
3 min read
How To Reduce The Alert Noise For Optimal On-Call Performance
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Aug 19
How To Reduce The Alert Noise For Optimal On-Call Performance
#
oncall
#
sre
#
incidentresponse
#
incidentmanagement
Comments
Add Comment
10 min read
The Cornerstones of SRE: SLI, SLO and SLA
Sourav Dhiman
Sourav Dhiman
Sourav Dhiman
Follow
Aug 15
The Cornerstones of SRE: SLI, SLO and SLA
#
devops
#
devopsdigest
#
kubernetes
#
sre
Comments
Add Comment
4 min read
Datadog : how to filter metrics on tag "team"
Lucien Boix
Lucien Boix
Lucien Boix
Follow
Sep 17
Datadog : how to filter metrics on tag "team"
#
sre
#
devops
#
datadog
#
kubernetes
1
reaction
Comments
Add Comment
3 min read
Do You Need All That Support Levels After All?
femolacaster
femolacaster
femolacaster
Follow
Aug 18
Do You Need All That Support Levels After All?
#
devops
#
automation
#
sre
#
productivity
3
reactions
Comments
Add Comment
7 min read
AWS Observability Maturity Model - V2
Indika_Wimalasuriya
Indika_Wimalasuriya
Indika_Wimalasuriya
Follow
for
AWS Community Builders
Sep 14
AWS Observability Maturity Model - V2
#
awsobservability
#
aws
#
observability
#
sre
9
reactions
Comments
Add Comment
5 min read
Context is all you need.
Szymon Stawski
Szymon Stawski
Szymon Stawski
Follow
Sep 13
Context is all you need.
#
devops
#
sre
1
reaction
Comments
Add Comment
1 min read
Enhance Your System Reliability with These Top Log Monitoring Tools
Alerty
Alerty
Alerty
Follow
Aug 22
Enhance Your System Reliability with These Top Log Monitoring Tools
#
monitoring
#
sre
#
logging
#
javascript
Comments
1
comment
2 min read
CrowdStrike Incident: 5 Key Lessons for DevOps & IT Teams
Eduardo Messuti
Eduardo Messuti
Eduardo Messuti
Follow
for
StatusPal
Aug 21
CrowdStrike Incident: 5 Key Lessons for DevOps & IT Teams
#
devops
#
development
#
sre
#
webdev
1
reaction
Comments
Add Comment
5 min read
Implementing SLOs in Microservices: A Comprehensive Guide to Reliability and Performance
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 11
Implementing SLOs in Microservices: A Comprehensive Guide to Reliability and Performance
#
sre
1
reaction
Comments
Add Comment
9 min read
Cold Storage: A Deep Dive into the Frozen Vaults of Data
femolacaster
femolacaster
femolacaster
Follow
Aug 30
Cold Storage: A Deep Dive into the Frozen Vaults of Data
#
data
#
devops
#
sre
#
security
2
reactions
Comments
Add Comment
11 min read
Configurando o Terraform para funcionar corretamente com o LocalStack
Stefano Martins
Stefano Martins
Stefano Martins
Follow
Aug 20
Configurando o Terraform para funcionar corretamente com o LocalStack
#
terraform
#
sre
#
devops
#
aws
Comments
Add Comment
3 min read
Implementing SLO Error Budget Monitoring with AWS Services Only
Takashi Iwamoto
Takashi Iwamoto
Takashi Iwamoto
Follow
for
AWS Community Builders
Sep 8
Implementing SLO Error Budget Monitoring with AWS Services Only
#
aws
#
cloudwatch
#
monitoring
#
sre
3
reactions
Comments
2
comments
5 min read
Synchronize Files between your servers
Amjad Abujamous
Amjad Abujamous
Amjad Abujamous
Follow
Sep 8
Synchronize Files between your servers
#
synchronization
#
production
#
sre
#
automation
Comments
Add Comment
3 min read
Static Site Generation
Suhas Palani
Suhas Palani
Suhas Palani
Follow
Aug 4
Static Site Generation
#
staticwebapps
#
sre
#
gatsby
Comments
Add Comment
4 min read
Advanced Incident Management Strategies for Engineers
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Aug 26
Advanced Incident Management Strategies for Engineers
#
incidentmanagement
#
sre
Comments
Add Comment
11 min read
Role of Human Oversight in AI-Driven Incident Management and SRE
Squadcast.com
Squadcast.com
Squadcast.com
Follow
for
Squadcast
Sep 2
Role of Human Oversight in AI-Driven Incident Management and SRE
#
incidentmanagement
#
sre
Comments
Add Comment
10 min read
14 Monitoring Tools for Full-Stack Developers
Hrish B
Hrish B
Hrish B
Follow
for
IncidentHub
Aug 31
14 Monitoring Tools for Full-Stack Developers
#
devops
#
sre
#
fullstack
#
webdev
1
reaction
Comments
Add Comment
7 min read
The Benefits of a Single Incident Management System
Hrish B
Hrish B
Hrish B
Follow
for
IncidentHub
Aug 29
The Benefits of a Single Incident Management System
#
sre
#
devops
#
monitoring
#
observability
Comments
Add Comment
2 min read
loading...
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account