Troubleshooting

Diagnose and fix common issues with a self-hosted Scalekit deployment on Kubernetes.

This guide helps you diagnose and resolve common issues when running Scalekit on your own Kubernetes cluster. Start with the quick diagnostics, then use the symptom/cause/solution sections below to fix specific problems.

Quick diagnostics

Before investigating specific errors, run these basic checks to identify which pods are unhealthy and gather initial diagnostics.

Check pod status

Set the namespace (the setup script defaults to scalekit):

1
# Set once for your deployment
2
NAMESPACE=${NAMESPACE:-scalekit}

1
kubectl get pods -n ${NAMESPACE}
2
kubectl describe pod <pod-name> -n ${NAMESPACE}
3
kubectl logs <pod-name> -n ${NAMESPACE} --tail=100

For the main Scalekit pod (multiple containers), specify the container:

1
# Main auth service
2
kubectl logs <pod-name> -n ${NAMESPACE} -c scalekit --tail=100
3

4
# Dashboard
5
kubectl logs <pod-name> -n ${NAMESPACE} -c dashboard --tail=100
6

7
# Svix (webhooks)
8
kubectl logs <pod-name> -n ${NAMESPACE} -c svix --tail=100

Helm deployment issues

`ImagePullBackOff` or `ErrImagePull`

Symptom: Pods fail to start with ImagePullBackOff or ErrImagePull.

Cause: The cluster cannot pull images from the Scalekit container registry. This is almost always caused by a missing or expired registry secret.

Solution:

Confirm the secret exists:

1
kubectl get secret artifact-registry-secret -n ${NAMESPACE}

If it is missing, recreate it:

1
kubectl create secret docker-registry artifact-registry-secret \
2
  --docker-server=<registry-server-url> \
3
  --docker-username=oauth2accesstoken \
4
  --docker-password=<your-registry-token> \
5
  -n ${NAMESPACE}

Verify your registry token has not expired. Tokens from the distribution portal are time-limited.

Migration hook fails or times out

Symptom: The db-migrations job fails or the Helm install hangs on the pre-install hook.

Cause: The migration job cannot connect to PostgreSQL (wrong connection string, database does not exist, or network issue).

Solution:

Check the job logs:

1
kubectl get jobs -n ${NAMESPACE}
2
kubectl logs job/scalekit-db-migrations -n ${NAMESPACE}

Verify the DATABASE_URL secret is correct and points to a reachable PostgreSQL instance.
Ensure the target databases (scalekit, webhooks, openfga) exist and the database user has full privileges.

Pod stuck in `CrashLoopBackOff`

Symptom: The main Scalekit pod keeps crashing and restarting.

Cause: A required secret is missing or a configuration value (hostnames, domains, credentials) is incorrect.

Solution:

Check the previous container logs for the exact error:

1
kubectl logs <pod-name> -n ${NAMESPACE} -c scalekit --previous

Common causes:
- Missing keys in authentication-secret → Re-run bash setup-secrets.sh
- database.host or redis.host unreachable → Verify connectivity and values in values.yaml
- domain does not match your gateway hostname → Correct values.yaml and re-apply
Once the cause is fixed, delete the pod (or the whole release) so Kubernetes restarts it with the updated configuration.

Gateway and ingress issues

Gateway has no external IP

Symptom: kubectl get gateway shows no address or the Gateway stays in a pending state.

Cause: The GatewayClass is missing, or the Gateway controller (e.g. GKE Gateway) is not running in the cluster.

Solution:

Inspect the Gateway:

1
kubectl get gateway -n ${NAMESPACE}
2
kubectl describe gateway scalekit -n ${NAMESPACE}

Verify a matching GatewayClass exists:
Terminal window
```
1
kubectl get gatewayclass
```
Ensure gateway.className in your values.yaml exactly matches the installed GatewayClass name.

Getting help

If the issue is not covered here, gather the following information before contacting support:

Namespace: The Kubernetes namespace where Scalekit is deployed
Error messages: Full output from kubectl logs, kubectl describe, and Helm output
Pod status: Output of kubectl get pods -n ${NAMESPACE}
Values used: Relevant sections from your values.yaml (redact secrets)
Steps to reproduce: What you were doing when the issue occurred
Environment: Local (Minikube), GKE, or other cluster

Support channels

Documentation: Review the quickstart, installation, and configuration guides
Community: Join the #ask-anything channel on Slack (the Scalekit Community workspace)
Support: Submit a ticket through your Scalekit dashboard or contact support@scalekit.com for production issues

Troubleshooting

Quick diagnostics

Check pod status

Helm deployment issues

ImagePullBackOff or ErrImagePull

Migration hook fails or times out

Pod stuck in CrashLoopBackOff

Gateway and ingress issues

Gateway has no external IP

Getting help

Support channels

Next steps

`ImagePullBackOff` or `ErrImagePull`

Pod stuck in `CrashLoopBackOff`