DEV Community

Cover image for Incident report (Postmortem)
Alex
Alex

Posted on

Incident report (Postmortem)

8 December 2021

Abstract

There was a discrepancy between Bitwarden’s status page (OK) and the actual service (KO) for an extended period of time.

Timeline

Date Status
8 Dec - 10:35 CET The Customer Success team reports that users having issues logging in
8 Dec - 12:24 CET Bitwarden’s status page is updated to “Investigating”
8 Dec - 13:24 CET The Engineering team makes changes to the content delivery network. Users are able to log in normally
8 Dec - 13:29 CET Bitwarden’s status page is updated to “Resolved”

Problem

The agent in charge of updating the status page had the status system’s credentials stored in Bitwarden itself. Given the issue the platform was experiencing, they could not retrieve the credentials.
Recent changes on Customer Success composition led to the backup agent not having the appropriate credentials to access the status system.

Failsafe

Even tough we acknowledge this was a problematic situation, it's worth mentioning:

  • Bitwarden’s status page was always available due to it being in a completely different infrastructure.
  • There is a health check protocol in place to automatically report outages. Unfortunately, due to the nature of the issue, it did not trigger.
  • We have redundant agents that are able to update the status page. Albeit, the actuation was slower than desired.

Future solutions

Immediate actions taken

The immediate backup has now their credentials set up.
The agent in charge of the status system, and the procedures now include that these kinds of credentials to communication services should not be stored in Bitwarden itself, in case of a similar outage.

Action items

We will be reviewing the status system health check to trigger in a similar scenario.

Top comments (0)