On Sunday, March 31, 2019 from 8:00 PM to 9:48 PM Eastern Time (ET), users accessing many Squarespace sites saw error messages like ‘Your connection is not private' and 'Your connection to this site is not secure'.
All sites were available and secure during this time, but errors were displayed due to a problem with our SSL certificate management system. This issue impacted all customers who had enabled HTTPS for their sites.
We sincerely apologize to our customers and their visitors for the disruption caused.
(All times are ET)
We use a security practice called mutual TLS to authenticate communication between the different components that power Squarespace. One of our internal certificates was due to expire at 8 PM on March 31st. Although our security team had generated new certificates and deployed them to the servers, the commands they used to reload the configuration did not successfully reread the certificates from disk. The old certificates continued to be used by the services.
At 8 PM, the old internal certificates expired, and some services inside Squarespace were no longer able to communicate with each other. This included the service that stores SSL certificates for Squarespace customer sites. As a result, our servers were unable to look up the correct certificates for websites and instead served the certificate for squarespace.com, following the default behavior that we use for sites that have not yet generated a certificate. We consider this a safe fallback, since it preserves secure communication, keeps the site available and issues a warning to the visitor.
At 8:20 PM, a Squarespace engineer on our Edge computing team noticed irregular traffic for the certificate service, and raised an alarm. They worked with the Security team and by 8:45 PM were able to confirm that expired certificates were still being served from memory. They initiated a full restart of the affected services at 8:50 PM, forcing the servers to load the new certificates from disk, and allowing our services to communicate with each other again. Service was incrementally restored as nodes restarted. By 9:48 PM, service was completely restored and clients were no longer shown the error message.
We took longer to discover and remediate this issue than we would have liked. Our monitoring system should have alerted us immediately after 8 PM.
We usually fall back to our default Squarespace certificate when users have enabled HTTPS but have not yet generated a certificate. However, we did not have an automated alert for the number of default certificates being unusually high.
We are taking steps to improve our response time in the future, including faster detection and being able to safely restart services more quickly. We’re also working to improve our process for rotating certificates within our infrastructure.
We sincerely apologize for this incident. We know how disruptive outages can be for our customers and their visitors, and we take that very seriously. As with any outage, we will take every opportunity to learn and to improve our infrastructure so that we can prevent these incidents from occurring in the future.
Sincerely,
John Colton
Squarespace
SVP Engineering