Some systems are experiencing issues

About This Site

This page is intended to provide a quick overview of the operational status of the Sepia lab. It doesn't try to provide many testing-related metrics.

For more detailed testing information, see the Grafana dashboard

Past Incidents

Saturday 14th May 2022

No incidents reported

Friday 13th May 2022

No incidents reported

Thursday 12th May 2022

No incidents reported

Wednesday 11th May 2022

No incidents reported

Tuesday 10th May 2022

Long Running Cluster Health Long Running Cluster Outage

While adding some new hosts to the Sepia Long Running Cluster, the cluster got into a state where all the MONs started locking up due to lack of system resources. Josh, Neha, Dan, and David have been working to restore the cluster service by service.

The following workloads are down:

  • teuthology runs
  • Ceph CI builds (Jenkins/shaman)
  • quay.ceph.io
  • telemetry.ceph.com / telemetry-public.ceph.com
  • chacra.ceph.com
  • All services relying on the LRC have been restored. I will be upgrading all the daemons to a version of Ceph that have fixes for the problems we ran into.

  • The LRC is back up and OSD recovery is still in progress. We're letting things settle before bringing up any clients.

  • Monday 9th May 2022

    No incidents reported

    Sunday 8th May 2022

    No incidents reported

    Saturday 7th May 2022

    No incidents reported

    Friday 6th May 2022

    No incidents reported

    Thursday 5th May 2022

    No incidents reported