Connectivity Issues
Incident Report for Squarespace
Postmortem

On Tuesday, July 21, 2020 between 21:36 and 22:40 EDT, and again on Wednesday, July 22, 2020 between 16:41 and 17:10 EDT, for an aggregate of 93 minutes, many Squarespace websites were unavailable. During these incidents, we were able to serve approximately 45% of requests from our cache system. Site visitors saw slow loads or “Service/Unavailable” errors and were unable to use our Commerce platform. Site editing was also unavailable during this time.

The root cause was a persistent spike in anomalous traffic, which did not trigger our mitigation system and bypassed our cache. This traffic caused a large number of intensive database queries to our primary database. In turn, this significantly increased the latency of queries, which eventually overwhelmed our application servers.

During the first incident, actions taken at 21:54 EDT resolved database latency, which allowed most of the application servers to partially recover and properly serve some requests. We then disabled a background maintenance job running on the database at 22:32 EDT, and traffic fully recovered at 22:40 EDT.

On Wednesday morning, we held an internal retrospective to understand fully what happened. We eliminated the background database job as the cause of the instability. We identified the abnormal traffic and discussed what preventative actions we would need to take. Unfortunately, a similar event occurred that day. We quickly blocked the anomalous traffic and restored service. Simultaneously, we identified and corrected this defect in our system.

Our team has prioritized improvements to the detection and mitigation of similar issues. Additionally, we have scheduled improvements to our caching.

We deeply apologize for these incidents. It is of the utmost importance to us that Squarespace sites be up and available. Thank you for your patience.

Posted Jul 24, 2020 - 16:09 EDT

Resolved
This incident has been resolved.
Posted Jul 22, 2020 - 17:25 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 22, 2020 - 17:10 EDT
Update
We are continuing to investigate this issue.
Posted Jul 22, 2020 - 17:05 EDT
Investigating
We are currently investigating this issue.
Posted Jul 22, 2020 - 16:41 EDT
This incident affected: Site Loading and Site Editing.