Troubleshooting Sourcegraph frontend 503/504

Symptoms

Frontend pods show readiness probe failures (HTTP 503), and clients see 504 Gateway Timeouts.
Frontend logs include "slow http request" and unexpected 500 responses with long durations (~120s).

Probable causes

Primary cause: database (Postgres) unreachable or degraded — frontend depends on DB for many requests and will time out if DB is unavailable.
Disk full on a DB or cluster node, causing services to fail or I/O to slow.

Troubleshooting steps

Describe the pod and check Events and Containers > Last State > Message:
```
kubectl describe pod -n <namespace> -l app=sourcegraph-frontend
```

Check frontend logs:


kubectl logs -n <namespace> -l app=sourcegraph-frontend

Curl inside the pod to confirm the frontend process responds. If the process is running correctly, it will return the sign-in page; hence, the error is not in sourcegraph-frontend:


kubectl exec -n <namespace> -it deploy/sourcegraph-frontend -- curl http://localhost:3080

Expected Response:
Defaulted container "frontend" out of: frontend, migrator (init)
<a href="/sign-in?returnTo=%2F">Found</a>.

Check DB health and disk usage. If the disk is full, increase disk space or free space on the DB host. Verify Postgres is accepting connections.
In Kubernetes, verify the DB service/endpoint is reachable from the frontend pod and that secrets (pgsql secret) are correctly mounted.

After DB is healthy, restart affected frontend pods and perform a rollout restart of the deployment:


kubectl -n <namespace> delete pod
kubectl -n <namespace> rollout restart deployment/sourcegraph-frontend