Kubernetes Troubleshooting
Troubleshoot Kubernetes based deployment
Zus network, which is based on Kubernetes, is set up for System Test. However, debugging is necessary if the test fails for whatever reason.
All Alerts for tests are setup in devops-0chain Slack Channel.
For tests failure check GitHub Actions Annotations first.
There are two ways majorly for which System-Test fail.
"One or more 0Chain components (listed below) crashed during the test run, therefore the build is NOT STABLE"
This error means the deployed 0chain providers crashed/failed/restarted while System Test was running.
"System tests failed. Ensure tests are running against the correct images/branches and rule out any possible code issues before attempting a re-run"
System Tests are failing so report on qa channel.


TROUBLESHOOTING
0chain Crashed during test run -
Check which services failed sharder, miner, blobber, 0box, etc.
After confirming the service name check the GitHub Action Artifacts
Check Crashed logs or particular service logs by downloading it from artifacts.

For more login to rancher ( https://rancher.dev-[1-9].devnet-0chain.net/ ) and check each services logs. Read Rancher Doc.
For detailed Monitoring & Logging check Grafana & Loki Doc.
PROVIDERS TROUBLESHOOTING
Sharder - There can be 4 reasons why sharder is stuck or crashed -
While deploying the network sharder showed panic
After successful deployment sharder restarted
Not able to connect with databases
OOMKilled
In all the above cases these Kubernetes commands can be useful -
Fetch sharder pod name by using this command -
kubectl get po -A | grep sharderCheck sharder logs by following command -
kubectl logs << pod_name >> -n << namespace_name >> -c << container_name >>example -
kubectl logs helm-sharder-01-68ccbd65dd-g6zbw -n dev-1 -c helm-sharder-01For OOMKilled, run the following command -
kubectl describe po helm-sharder-01-68ccbd65dd-g6zbw -n dev-1example -
kubectl describe po helm-sharder-01-68ccbd65dd-g6zbw -n dev-1For detailed logging, Use loki query -
{tag="<< container_name >>"} |= ``
example -
{tag="helm-sharder-01"} |= ``
For quick check, use rancher logs.

Miner - There can be 4 reasons why miner is stuck or crashed -
While deploying the network miner showed panic
After successful deployment miner restarted
Not able to connect with databases
OOMKilled
In all the above cases these Kubernetes commands can be useful -
Fetch miner pod name by using this command -
kubectl get po -A | grep minerCheck miner logs by following command -
kubectl logs << pod_name >> -n << namespace_name >> -c << container_name >>example -
kubectl logs helm-miner-01-68ccbd65dd-g6zbw -n dev-1 -c helm-miner-01For OOMKilled, run the following command -
kubectl describe po helm-miner-01-68ccbd65dd-g6zbw -n dev-1example -
kubectl describe po helm-miner-01-68ccbd65dd-g6zbw -n dev-1For detailed logging, Use loki query -
{tag="<< container_name >>"} |= ``
example -
{tag="helm-miner-01"} |= ``

For quick check, use rancher logs.

Blobber - There can be three reasons why blobber is stuck or crashed -
While deploying the network blobber showed panic
After successful deployment blobber restarted
Not able to connect with databases
In all the above cases these Kubernetes commands can be useful -
Fetch miner pod name by using this command -
kubectl get po -A | grep blobberCheck miner logs by following command -
kubectl logs << pod_name >> -n << namespace_name >> -c << container_name >>example -
kubectl logs helm-blobber-01-68ccbd65dd-g6zbw -n dev-1 -c helm-blobber-01For detailed logging, Use loki query -
{tag="<< container_name >>"} |= ``
example -
{tag="helm-blobber-01"} |= ``

For quick check, use rancher logs.

0Box - There can be three reasons why 0box is crashed -
While deploying the network 0box panic
After successful deployment 0box started
Not able to connect with databases
In all the above cases these Kubernetes commands can be useful -
Fetch miner pod name by using this command -
kubectl get po -A | grep zboxCheck miner logs by following command -
kubectl logs << pod_name >> -n << namespace_name >>example -
kubectl logs helm-zbox-01-68ccbd65dd-g6zbw -n dev-1For detailed logging, Use loki query -
{tag="<< container_name >>"} |= ``
example -
{container="helm-zbox"} |= ``

For quick check, use rancher logs.

Authorizer - There can be few reasons why authorizer is crashed -
While deploying the network authorizer panic
After successful deployment authorizer started
In all the above cases these Kubernetes commands can be useful -
Fetch miner pod name by using this command -
kubectl get po -A | grep authorizerCheck miner logs by following command -
kubectl logs << pod_name >> -n << namespace_name >>example -
kubectl logs helm-authorizer-01-68ccbd65dd-g6zbw -n dev-1For detailed logging, Use loki query -
{tag="<< container_name >>"} |= ``
example -
{container="helm-authorizer-01"} |= ``

For quick check, use rancher logs.

Last updated