Services Unstable

Incident Report for Paag Tech

Resolved

The AWS instability incident that affected our platforms today has been fully resolved.

Following the resolution of the issue with our underlying infrastructure provider, we have verified that all of our services are now stable and fully operational.

We appreciate your patience during this period of instability.

Posted Oct 20, 2025 - 20:52 GMT-03:00

Update

AWS has reported that service Instability - Resolved

"[RESOLVED] Increased Error Rates and Latencies

Oct 20 3:53 PM PDT Between 11:49 PM PDT on October 19 and 2:24 AM PDT on October 20, we experienced increased error rates and latencies for AWS Services in the US-EAST-1 Region. Additionally, services or features that rely on US-EAST-1 endpoints such as IAM and DynamoDB Global Tables also experienced issues during this time. At 12:26 AM on October 20, we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due to its dependency on DynamoDB. As we continued to work through EC2 instance launch impairments, Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch. We recovered the Network Load Balancer health checks at 9:38 AM. As part of the recovery effort, we temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations. Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary."

At 3:01 PM PDT, AWS confirmed that all services had returned to normal operation. Our platforms are now stable and fully operational.

We appreciate your patience during this period of instability.

Posted Oct 20, 2025 - 20:49 GMT-03:00

Update

AWS has reported continued progress toward full recovery in the US-EAST-1 region:

"We have restored EC2 instance launch throttles to pre-event levels and EC2 launch failures have recovered across all Availability Zones in the US-EAST-1 Regions. AWS services which rely on EC2 instance launches such as Redshift are working through their backlog of EC2 instance launches successfully and we anticipate full recovery of the backlog over the next two hours. We can confirm that Connect is handling new voice and chat sessions normally. There is a backlog of analytics and reporting data that we must process and anticipate that we will have worked through the backlog over the next two hours."
We will provide an update by 3:30 PM PDT.

Our systems remain in a state of partial degradation, particularly for operations that depend on AWS services in the affected region. We continue to closely monitor the situation and take all necessary measures to minimize user impact.

Posted Oct 20, 2025 - 19:52 GMT-03:00

Update

AWS has reported continued progress toward full recovery in the US-EAST-1 region:

“We have continued to reduce throttles for EC2 instance launches in the US-EAST-1 Region and we continue to make progress toward pre-event levels in all Availability Zones (AZs). AWS services such as ECS and Glue, which rely on EC2 instance launches, will recover as the successful instance launch rate improves. We see full recovery for Lambda invocations and are working through the backlog of queued events, which we expect to be fully processed within the next two hours.
Another update will be provided by 2:30 PM PDT.”
Our systems remain in a state of partial degradation, particularly for operations that depend on AWS services in the affected region. We continue to closely monitor the situation and take all necessary measures to minimize user impact.

Posted Oct 20, 2025 - 18:15 GMT-03:00

Update

AWS Infrastructure Incident
AWS has provided a new update indicating continued recovery in the US-EAST-1 region:
“Service recovery across all AWS services continues to improve. We continue to reduce throttles for new EC2 Instance launches in the US-EAST-1 Region that were put in place to help mitigate impact. Lambda invocation errors have fully recovered and function errors continue to improve. We have scaled up the rate of polling SQS queues via Lambda Event Source Mappings to pre-event levels.
Another update will be provided by 1:45 PM PDT.”
Some of our services may still experience partial degradation, and we continue to actively monitor and mitigate any potential impact to our users.

Posted Oct 20, 2025 - 17:08 GMT-03:00

Update

AWS Infrastructure Incident

We are currently experiencing degraded performance in some of our services, caused by the ongoing incident affecting AWS infrastructure in the US-EAST-1 region.
According to the latest AWS update:
“We continue to observe recovery across all AWS services, and instance launches are succeeding across multiple Availability Zones in the US-EAST-1 Region. For Lambda, customers may face intermittent function errors when making network requests to other services or systems, as we work to address residual network connectivity issues.
To mitigate these issues, we initially reduced the rate of SQS polling via Lambda Event Source Mappings and are now increasing it as we observe more successful invocations and reduced errors.
Another update will be provided by 1:00 PM PDT.”

We remain actively monitoring the situation and are working to minimize any impact to our users. Further updates will be shared as they become available.

Posted Oct 20, 2025 - 16:31 GMT-03:00

Monitoring

AWS Update (based on AWS Status Page):
According to the latest information published on the AWS Status Page, the following mitigation efforts are currently in progress:
“We have applied multiple mitigations across multiple Availability Zones (AZs) in US-EAST-1 and are still experiencing elevated errors for new EC2 instance launches. We are rate limiting new instance launches to aid recovery. We will provide an update at 7:30 AM PDT or sooner if we have additional information.”
We are closely monitoring the official AWS updates and continue to track any potential impact on our services.

Posted Oct 20, 2025 - 10:52 GMT-03:00

Update

We are continuing to investigate this issue.

Posted Oct 20, 2025 - 10:51 GMT-03:00

Update

We are continuing to investigate this issue.

Posted Oct 20, 2025 - 09:30 GMT-03:00

Investigating

We are currently experiencing a global issue caused by instability in AWS services.
AWS is responsible for part of our infrastructure, and this outage is affecting some of our systems as well as many other services worldwide.
We are closely monitoring the situation through the official AWS Status Page and continuously observing our environments.
As soon as there are any updates or full service restoration, we will publish a new update on this page.

Posted Oct 20, 2025 - 09:30 GMT-03:00

This incident affected: APIs (Transactions, PIX, Sistema de Webhooks), Shield (KYC, PLD), Payments (Cashin, Cashout), Finance (Pagamento de boletos, PIX em lote), and Backoffice.