Sifflet - Downtime on eu-west-1-a cell – Incident details

Downtime on eu-west-1-a cell

Resolved
Major outage
Started 7 days agoLasted 4 days

Affected

Web application

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

eu-west-1-a - Web application

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

Integrations

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

eu-west-1-a - Integrations

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

Datasource monitoring

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

eu-west-1-a - Datasource monitoring

Major outage from 12:10 PM to 12:25 PM, Operational from 12:25 PM to 8:27 AM

Updates
  • Resolved
    Resolved

    This incident has been resolved.

    Downtime started at 2:10pm CEST on Friday the 8th of August and was over by 2:35pm CEST on the same day, and only affected our eu-west-1-a tenants.

    The cause of the downtime was an unavailability of the database caused by a sudden surge in the database usage, that occurred at an unexpected speed and magnitude. Our auto-scaling and monitoring systems failed to react in time.

    We're improving our monitoring and auto-scaling to prevent such downtime to happen again and investigating the component that might be responsible for this unnecessary database load.

  • Monitoring
    Monitoring
    Remediation has been successfully applied. We are closely monitoring the status of the cell and investigating its root cause to prevent it from happening again.
  • Identified
    Identified

    We are continuing to work on a fix for this incident. Remediation is being applied.

  • Investigating
    Investigating

    We're investigating reports of degraded performance across the Sifflet web applications.