Previous incidents

Dec 2024 to Feb 2025

February 2025

Feb 25, 2025

1 incident

Issue with Volume Storage in CA-MTL-1

Resolved Feb 25, 2025 at 2:53pm UTC

We have discovered an issue affecting pods running in CA-MTL-1 when using volume disk or network storage. When executing commands, the process may hang, although the file is still created successfully.

So far, this issue primarily impacts most H100 GPUs and a few A40 GPUs. Our team is actively investigating and will provide updates here as we learn more.

We have identify the root cause of the issue, team is pushing the updates to machine.

All machines have been updated, and...

Feb 15, 2025

1 incident

EU-CZ-1 Data Center Upgrade

Resolved

Resolved Feb 15, 2025 at 5:00pm UTC

We are currently upgrading the EU-CZ-1 data center, and all machines are offline during this process. Services hosted in this region are temporarily unavailable during this period.

We’ve successfully brought most of the machines online. However, due to some technical issues, we need a bit more time to restore the remaining ones. Thanks for your patience, we’ll keep you posted!

All machines in the EU-CZ-1 data center are now fully online. The data center upgrade is complete, t...

Feb 13, 2025

1 incident

Serverless Request Issue

Resolved

Resolved Feb 13, 2025 at 11:23pm UTC

We experienced an issue affecting serverless requests from 10:00 PM to 10:23 PM UTC. This was due to an update made to improve system capacity in the NYC region, which led to temporary request issues.

The issue has been identified and resolved, and we’ve taken steps to minimize future risks.

We are still seeing issues, and our team is actively investigating. We’ll provide further updates as soon as we have more information.

We have identified the issue and will be rolling out a...

Feb 11, 2025

1 incident

🚨 CA-MTL-1 Network Volume Performance Issue 🚨

Resolved

Resolved Feb 11, 2025 at 4:00pm UTC

We’re currently experiencing performance issues with network volumes in the CA-MTL-1 data center. Our team is investigating the issue, and we’ll provide updates as soon as possible.

We detected a performance issue with one of the chunk servers and have isolated the affected server.

The issue has been resolved

Feb 05, 2025

1 incident

Main UI Console Page Down

Resolved

Resolved Feb 6, 2025 at 1:52am UTC

We are currently experiencing issues accessing the Main UI Console Page. Our team is actively investigating the cause, and we will provide updates as soon as we have more information.

Our authentication provider, Clerk, is experiencing issues and is currently down. We are closely monitoring the situation and will provide updates as soon as we have more information.

Workaround:

Our GraphQL API and serverless endpoints are unaffected.
Users can still call the GraphQL ...

January 2025

Jan 30, 2025

1 incident

CA-MTL-3 Network Disruption

Resolved

Resolved Jan 30, 2025 at 11:14am UTC

CA-MTL-3 is suffering a network disruption due to an upstream provider issue. We are in contact with the provider and are working to restore network availability now.

the network is restored

Jan 23, 2025

1 incident

US-TX-4 Network Disruption

Resolved

Resolved Jan 23, 2025 at 10:52pm UTC

US-TX-4 is suffering a network disruption due to an upstream provider issue. We are in contact with the provider and are working to restore network availability now.

The US-TX-4 region will experience a short network disruption at approximately 01/23/2025 5:30 PM CST for about 10 minutes due to an emergency firewall update.

We apologize for any inconvenience and appreciate your understanding as we perform this critical update.

The issue affecting US-TX-4 has been resolve...

Jan 12, 2025

1 incident

EU-SE-1 Network Disruption

Resolved

Resolved Jan 16, 2025 at 12:00am UTC

The network issue at the data center has been resolved. Thank you for your patience.

1 previous update

December 2024

Dec 18, 2024

1 incident

US-TX-3 Network Disruption

Resolved

Resolved Dec 19, 2024 at 4:34am UTC

US-TX-3 is suffering a network disruption due to an upstream provider issue. We are in contact with the provider and are working to restore network availability now.

Dec 17, 2024

1 incident

US-TX-3 Network Disruption

Resolved

Resolved Dec 17, 2024 at 8:42pm UTC

This issue was due to an upstream provider and has been resolved. We have requested an RCA and will provide updates as applicable.

1 previous update