Previous incidents

May 2024
May 09, 2024
1 incident

Decreased reliability for GPU workers that need to spawn large numbers of pro...

Downtime

Resolved May 10 at 01:00pm PDT

Summary:

GPU pods were being given too low of a Process ID (PID) Limit, which could cause them to suffer unexpected failures when launching >1024 processes.

Source of Bug:

  • Logic error created as part of adding AMD GPU vendor support.

Timeline

  • START: ~12:00 PST 2024-05-09
  • END: ~13:00 PST 2024-05-10

Suggested Actions by Category:

Serverless

This should resolve itself automatically if you allow your workers to scale to zero. Alternatively, force-scal...

1 previous update

April 2024
Apr 13, 2024
1 incident

Network Upgrades

Maintenance

Resolved Apr 13 at 10:00am PDT

We'll be running network upgrades for us-or-1. In case you are having any issues with the new public IP going forward please remember that it will only work after this upgrade has completed. If you have hard coded any IP addresses anywhere please remember to upgrade them and restart your services after this migration.

March 2024
No incidents reported