Partial Outage on Write Pro

Incident Report for DeepL

Postmortem

A change to enable advanced debugging on a component supporting DeepL write for pro customers, caused the service to deploy incorrectly and reject incoming requests. We are looking into the root cause of this and will update this page with our findings after conducting a thorough internal review.

We have identified the root cause and want to share the results here. The rollout of a feature to enhance error debugging resulted in a Write Pro instance being scheduled on an overloaded host, creating a "noisy neighbor" issue. This led to slow responses and connection pileups in our reverse proxy. To prevent similar incidents, we plan to implement several measures to better isolate resource-intensive workloads and improve how our services handle temporary slowdowns.

Posted Feb 18, 2025 - 13:50 UTC

Resolved

This incident has been resolved.
Posted Feb 18, 2025 - 13:33 UTC

Monitoring

Moving the incident to monitoring. We see that service is restored, requests and errors are back to pre-incident levels.
Posted Feb 18, 2025 - 13:28 UTC

Identified

We've rolled back to a recent deployment that affected the service. Service should be restored in a few minutes.
Posted Feb 18, 2025 - 13:24 UTC

Update

More languages seem to be affected.
Posted Feb 18, 2025 - 13:21 UTC

Update

We are continuing to investigate this issue.
Posted Feb 18, 2025 - 13:19 UTC

Investigating

We are currently seeing errors for a subset of languages on Write Pro and are investigating.
Posted Feb 18, 2025 - 13:19 UTC
This incident affected: DeepL Pro.