Resolve common upgrade issues
On this page
- Scenario 1: CrashLoopBackOff state
- Scenario 2: Pod not ready
- Scenario 3: Pod running and ready, but application exhibits unexpected behaviour
- Scenario 4: Pod pending due to init container issues
Even with careful preparation, upgrades may sometimes encounter issues. This section describes the most common issues you might face during or after the CluedIn upgrade process and provides guidance on how to resolve them quickly.
Scenario 1: CrashLoopBackOff state
Problem
A pod enters a CrashLoopBackOff
state: the container keeps starting, failing, and restarting in a loop.
In this case, when you run kubectl get pods –n cluedin
, you will see similar output showing a high number of restarts and the CrashLoopBackOff
status.
NAME READY STATUS RESTARTS AGE
---- ----- ------ -------- ---
cluedin-ui-7d9f8d7c9d-abc12 0/1 CrashLoopBackOff 9 5m
Troubleshooting
-
Review the logs of the previous container instance (not the current one that is restarting). It is important to review the previous logs because they usually contain the exact error message that caused the crash.
To check the logs from before the crash, add the
–p
(stands for previous) flag at the end of thekubectl logs
command:kubectl logs <pod name> -n cluedin –p
Example (for a pod named
cluedin-neo4j-0
):kubectl logs cluedin-neo4j-0 -n cluedin -p
Sample output:
2025-09-25 09:35:09.212+0000 INFO Starting Neo4j. 2025-09-25 09:35:09.522+0000 INFO Setting max memory usage to 1.5GiB 2025-09-25 09:35:11.785+0000 INFO Performing database recovery... 2025-09-25 09:35:14.120+0000 INFO Recovery complete. 2025-09-25 09:35:14.785+0000 INFO Starting Bolt connector on 0.0.0.0:7687 2025-09-25 09:35:15.333+0000 ERROR OutOfMemoryError: Java heap space 2025-09-25 09:35:15.335+0000 INFO Neo4j shutting down due to fatal error
-
Review the output for errors. This information often appears near the last few lines of the output. In the example above, the error is
OutOfMemoryError: Java heap space
.
Resolution
In the example above, the error indicates that the Neo4j container was terminated due to insufficient memory (OutOfMemoryError: Java heap space
). This usually happens in the following cases:
-
The container’s memory limit is set too low for the workload.
-
Neo4j’s internal memory settings (heap, page cache, and so on) are too aggressive for the available resources.
To fix this, increase the limit imposed on Neo4j StatefulSet.
Scenario 2: Pod not ready
Problem
A pod can be in the Running state but still marked as Not Ready if it is failing its readiness probes. This situation occurs when Kubernetes has successfully started the pod, but the application inside is not yet prepared to handle traffic. In other words, the container is alive, but it cannot serve requests.
In this case, when you run kubectl get pods –n cluedin
, you will see similar output:
NAME READY STATUS RESTARTS AGE
---- ----- ------ -------- ---
cluedin-ui-7d9f8d7c9d-abc12 0/1 Running 0 5m
Troubleshooting
To investigate whether a pod is failing due to a readiness probe, do the following:
-
Describe the pod and review the events section at the bottom of the output.
You may see warnings similar to the following:
Warning Unhealthy 2m (x4 over 4m) kubelet Readiness probe failed:
If you find repeated
Readiness probe failed
events, this confirms that the pod is starting but failing to pass the readiness check. For example, a pod might be running but remain Not Ready until it successfully connects to its database. In this case, the readiness probe will continue to fail until the dependency becomes available.Example:
kubectl logs <pod-name> -n cluedin
Sample output:
2025-09-19T10:25:12Z INFO Starting CluedIn ... 2025-09-19T10:25:15Z WARN Waiting for database connection... 2025-09-19T10:25:30Z ERROR Timeout connecting to SQL at db-service:4133
-
Examine the container logs, which may provide additional details on why the application is not ready to serve traffic.
Resolution
In the example above, the issue must be resolved by fixing the connectivity between the pod and the database. Common causes include:
-
A misconfigured connection string (for example, wrong host, port, username, or password).
-
The database may be under resource pressure (for example, CPU or memory exhaustion), which can prevent it from accepting new connections.
Addressing these problems will allow the pod to pass its readiness probe and become ready to serve traffic.
Scenario 3: Pod running and ready, but application exhibits unexpected behaviour
Problem
In some cases, a pod may be in the Running state and marked as Ready, but the application inside still shows unexpected or faulty behaviour. This indicates that the pod has passed its liveness and readiness probes, but the underlying issue lies within the application itself.
Troubleshooting
-
To begin diagnosing the issue, run the following command:
kubectl get pods –n cluedin
Sample output:
NAME READY STATUS RESTARTS AGE ---- ----- ------ -------- --- cluedin-ui-7d9f8d7c9d-abc12 1/1 Running 0 5m
Such output usually means that the problem is not with Kubernetes itself, but with the application inside the pod, or with network access between the user and the pod.
-
Even if a pod appears healthy, the application inside might be failing silently. To check for hidden errors, review the pod logs by running the following command:
kubectl logs <pod name> -n cluedin
Example (for a pod named
cluedin-gql-97cb77cd6-d5rcz
):kubectl logs cluedin-gql-97cb77cd6-d5rcz -n cluedin
Sample output:
14:48:32.145Z ERROR CluedIn.UI.GQL/CluedIn.UI.GQL: 500: Internal Server Error err: { "message": "500: Internal Server Error", "locations": [ { "line": 109, "column": 7 } ], "path": [ "inbound", "dataSource", connectorConfiguration ], "extensions": { "code": "INTERNAL_SERVER_ERROR", "response": { "url": "http://cluedin-server:9000/api/v1/configuration/providers?id=FA871776-60CA-49A6-8433-42BEE288400E", "status": 500, "statusText": "Internal Server Error", "body": "{\"type\":\"https://tools.ietf.org/html/rfc7231#section-6.6.1\",\"title\":\"An error occurred while processing your request.\",\"status\":500,\"detail\":\"Our job server is down and not accepting new providers for now\",\"traceId\":\"00-cc0351878f70a5edc267cdca4409b4b9-129d08969588f435-00\"}" } } }
-
If you want to read the log in a more convenient way, you can download it to a file and open it with any file reader.
kubectl logs <pod name> -n cluedin > <podname>.log
Resolution
In this example, the issue may be related to a failed connection to the job server (Redis). This is commonly caused by the cluedin-server
starting before Redis, preventing it from establishing a proper connection during boot.
To resolve this issue, restart the deployment of cluedin-server
so it can establish a proper connection to Redis during startup:
kubectl rollout restart deployment cluedin-server -n cluedin
Scenario 4: Pod pending due to init container issues
Problem
A pod can contain one or more application containers, and may also include one or more init containers:
-
Init containers run sequentially before the main application containers start. Each must complete successfully before any main container in the pod can begin running.
-
If an init container fails or cannot complete, the main container responsible for serving traffic may remain stuck in the Pending state. This means that the pod never progresses to running the main workload.
Troubleshooting
-
To verify whether a pod is unable to start because of a failing init container, describe the pod with the following command:
kubectl describe pod <pod-name> -n cluedin
Sample output:
Name: cluedin-ui-879c4db6b-8jzks Namespace: cluedin Status: Pending Controlled By: ReplicaSet/cluedin-ui-879c4db6b Init Containers: wait-cluedin-gql: Image: cluedinprod.azurecr.io/groundnuty/k8s-wait-for:v1.3 State: Terminated Reason: Error Exit Code: 1 Restart Count: 3 Args: service cluedin-gql -n cluedin Containers: ui: Image: cluedinprod.azurecr.io/cluedin/ui:2024.12.02 State: Waiting Reason: PodInitializing Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 5m kubelet Init container "wait-cluedin-gql" failed
In the example above:
-
The main container named
ui
is in theWaiting
state. This usually means it is waiting for the init containers to complete successfully. -
The events show that the init container
wait-cluedin-gql
has failed. In such cases, the pod cannot progress to running the main container until the init container issue is resolved.
-
-
Sometimes, an init container may run indefinitely without explicitly failing. In both scenarios, it is useful to inspect the init container logs for more details. You can view the logs of a specific init container by adding the
-c <init-container-name>
flag to thekubectl logs
command:kubectl logs <pod-name> -n cluedin -c <init-container-name>
This will help you understand why the init container is failing or stuck, and therefore why the main container cannot proceed.