Skip to content
Scalekit Docs
Talk to an Engineer Dashboard

Troubleshooting

Diagnose and fix common issues with a self-hosted Scalekit deployment on Kubernetes.

This guide helps you diagnose and resolve common issues when running Scalekit on your own Kubernetes cluster. Start with the quick diagnostics, then use the symptom/cause/solution sections below to fix specific problems.

Before investigating specific errors, run these basic checks to identify which pods are unhealthy and gather initial diagnostics.

Set the namespace (the setup script defaults to scalekit):

Terminal window
# Set once for your deployment
NAMESPACE=${NAMESPACE:-scalekit}
Terminal window
kubectl get pods -n ${NAMESPACE}
kubectl describe pod <pod-name> -n ${NAMESPACE}
kubectl logs <pod-name> -n ${NAMESPACE} --tail=100

For the main Scalekit pod (multiple containers), specify the container:

Terminal window
# Main auth service
kubectl logs <pod-name> -n ${NAMESPACE} -c scalekit --tail=100
# Dashboard
kubectl logs <pod-name> -n ${NAMESPACE} -c dashboard --tail=100
# Svix (webhooks)
kubectl logs <pod-name> -n ${NAMESPACE} -c svix --tail=100

Symptom: Pods fail to start with ImagePullBackOff or ErrImagePull.

Cause: The cluster cannot pull images from the Scalekit container registry. This is almost always caused by a missing or expired registry secret.

Solution:

  1. Confirm the secret exists:

    Terminal window
    kubectl get secret artifact-registry-secret -n ${NAMESPACE}
  2. If it is missing, recreate it:

    Terminal window
    kubectl create secret docker-registry artifact-registry-secret \
    --docker-server=<registry-server-url> \
    --docker-username=oauth2accesstoken \
    --docker-password=<your-registry-token> \
    -n ${NAMESPACE}
  3. Verify your registry token has not expired. Tokens from the distribution portal are time-limited.

Symptom: The db-migrations job fails or the Helm install hangs on the pre-install hook.

Cause: The migration job cannot connect to PostgreSQL (wrong connection string, database does not exist, or network issue).

Solution:

  1. Check the job logs:

    Terminal window
    kubectl get jobs -n ${NAMESPACE}
    kubectl logs job/scalekit-db-migrations -n ${NAMESPACE}
  2. Verify the DATABASE_URL secret is correct and points to a reachable PostgreSQL instance.

  3. Ensure the target databases (scalekit, webhooks, openfga) exist and the database user has full privileges.

Symptom: The main Scalekit pod keeps crashing and restarting.

Cause: A required secret is missing or a configuration value (hostnames, domains, credentials) is incorrect.

Solution:

  1. Check the previous container logs for the exact error:

    Terminal window
    kubectl logs <pod-name> -n ${NAMESPACE} -c scalekit --previous
  2. Common causes:

    • Missing keys in authentication-secret → Re-run bash setup-secrets.sh
    • database.host or redis.host unreachable → Verify connectivity and values in values.yaml
    • domain does not match your gateway hostname → Correct values.yaml and re-apply
  3. Once the cause is fixed, delete the pod (or the whole release) so Kubernetes restarts it with the updated configuration.

Symptom: kubectl get gateway shows no address or the Gateway stays in a pending state.

Cause: The GatewayClass is missing, or the Gateway controller (e.g. GKE Gateway) is not running in the cluster.

Solution:

  1. Inspect the Gateway:

    Terminal window
    kubectl get gateway -n ${NAMESPACE}
    kubectl describe gateway scalekit -n ${NAMESPACE}
  2. Verify a matching GatewayClass exists:

    Terminal window
    kubectl get gatewayclass
  3. Ensure gateway.className in your values.yaml exactly matches the installed GatewayClass name.

If the issue is not covered here, gather the following information before contacting support:

  • Namespace: The Kubernetes namespace where Scalekit is deployed
  • Error messages: Full output from kubectl logs, kubectl describe, and Helm output
  • Pod status: Output of kubectl get pods -n ${NAMESPACE}
  • Values used: Relevant sections from your values.yaml (redact secrets)
  • Steps to reproduce: What you were doing when the issue occurred
  • Environment: Local (Minikube), GKE, or other cluster