On October 20th, our services experienced significant degradation due to a major outage in the region of , which affected the availability of compute resources (). During the event, AWS stopped provisioning new instances, preventing our infrastructure from scaling and maintaining normal operational capacity.
In short, our database remained operational, but the servers handling user requests began to saturate and could not increase capacity.
Although the root cause of the incident was entirely external to Sytex, the global impact of the AWS outage highlighted opportunities to strengthen our operational resilience. During the event, we deployed a contingency compute cluster in the region, which allowed us to restore service continuity. We are currently optimizing this process so that, in similar scenarios, failover occurs more quickly and with minimal downtime.
We are also implementing structural improvements to ensure we are prepared should a similar situation arise again. Actions already underway include:
- Optimization of our process, achieving faster failover times.
- Evaluation of a strategy to ensure high availability in the event of regional failures.
We apologize for any inconvenience caused and reaffirm our commitment to the of the platform.
Each Availability Zone includes redundant networking, power, and storage resources.Sytex’s infrastructure is deployed across multiple Availability Zones.The outage on October 20th exceeded this layer of protection.
Cross-region redundancy adds latency and operational costs that, until now, we considered unjustified given the level of security offered by deployments.Despite the rarity of such events, we are now evaluating a deployment of compute resources.
Sytex operates with a transactional persistence model that makes operations highly complex.However, this remains our final line of defense.
In addition to storing , we also replicate persistent data in another cloud provider to recover operational capacity in the event of a catastrophic incident.