On Tuesday, July 21, 2020 between 21:36 and 22:40 EDT, and again on Wednesday, July 22, 2020 between 16:41 and 17:10 EDT, for an aggregate of 93 minutes, many Squarespace websites were unavailable. During these incidents, we were able to serve approximately 45% of requests from our cache system. Site visitors saw slow loads or “Service/Unavailable” errors and were unable to use our Commerce platform. Site editing was also unavailable during this time.
The root cause was a persistent spike in anomalous traffic, which did not trigger our mitigation system and bypassed our cache. This traffic caused a large number of intensive database queries to our primary database. In turn, this significantly increased the latency of queries, which eventually overwhelmed our application servers.
During the first incident, actions taken at 21:54 EDT resolved database latency, which allowed most of the application servers to partially recover and properly serve some requests. We then disabled a background maintenance job running on the database at 22:32 EDT, and traffic fully recovered at 22:40 EDT.
On Wednesday morning, we held an internal retrospective to understand fully what happened. We eliminated the background database job as the cause of the instability. We identified the abnormal traffic and discussed what preventative actions we would need to take. Unfortunately, a similar event occurred that day. We quickly blocked the anomalous traffic and restored service. Simultaneously, we identified and corrected this defect in our system.
Our team has prioritized improvements to the detection and mitigation of similar issues. Additionally, we have scheduled improvements to our caching.
We deeply apologize for these incidents. It is of the utmost importance to us that Squarespace sites be up and available. Thank you for your patience.