Troubleshooting Sourcegraph frontend 503/504
Symptoms
Frontend pods show readiness probe failures (HTTP 503), and clients see 504 Gateway Timeouts.
Frontend logs include "slow http request" and unexpected 500 responses with long durations (~120s).
Probable causes
Primary cause: database (Postgres) unreachable or degraded — frontend depends on DB for many requests and will time out if DB is unavailable.
Disk full on a DB or cluster node, causing services to fail or I/O to slow.
Troubleshooting steps
Describe the pod and check Events and Containers > Last State > Message:
kubectl describe pod -n <namespace> -l app=sourcegraph-frontendCheck frontend logs:
kubectl logs -n <namespace> -l app=sourcegraph-frontendCurl inside the pod to confirm the frontend process responds:
kubectl exec -n <namespace> -it deploy/sourcegraph-frontend -- curl -v http://localhost:3080Check DB health and disk usage. If the disk is full, increase disk space or free space on the DB host. Verify Postgres is accepting connections.
In Kubernetes, verify the DB service/endpoint is reachable from the frontend pod and that secrets (
pgsqlsecret) are correctly mounted.After DB is healthy, restart affected frontend pods and perform a rollout restart of the deployment:
kubectl -n <namespace> delete pod kubectl -n <namespace> rollout restart deployment/sourcegraph-frontend