Decco-consul-server pod in Error Status

Problem

The decco-consul-server pod is stuck in error status resulting the LTS3 [SMCP] management plane to be down

Javascript
    
 
#  kubectl get po -A | grep decco-consuldefault      decco-consul-consul-server-0         0/1     Error         1 (71s ago)       2m16s
Copy

Logs:

Decco-consul-server pod logs
    
2024-03-01T16:14:01.402Z [ERROR] agent: startup error: error="refusing to rejoin cluster because server has been offline for more than the configured server_rejoin_age_max (168h0m0s) - consider wiping your data dir"
Copy

Environment

Platform9 Self Managed Cloud Platform (SMCP) - v-5.9.1-3097398.

If the LTS3 [SMCP] setup is down for long time [days] due to issues like Disk/CPU related issue within the node [Master] in the Management cluster. And for the consul service the default value for "server_rejoin_age_max" in Consul is 7days . This parameter controls the maximum amount of time a server will wait before it rejoins the cluster after losing contact with a majority of the cluster.

Resolution

Steps:

1. Edit the decco-consul-consul-server-config configmap and add the server_rejoin_age_max: 2592000s [30days from default 7 days] parameter under the extra-from-values.json section as shown below,

ConfigMap
    
 
[root@test-pf9-du-host-airgap]# k edit cm decco-consul-consul-server-configkind: ConfigMapmetadata:  ...  name: decco-consul-consul-server-config  namespace: default...apiVersion: v1data:  central-config.json: |-    {      "enable_central_service_config": true    }  extra-from-values.json: |-                              # Newly added lines    {                                                     # Newly added lines     "server_rejoin_age_max": "2592000s"                  # Newly added lines     }                                                    # Newly added lines ...
Copy

The server_rejoin_age_max value can be set depending upon the duration of downtime.

2. Now restart the below pods in order mentioned here:

Restart pods
    
 
# kubectl delete pod/decco-consul-consul-connect-injector-f9c54d6cc-xmg4n# kubectl delete pod/decco-consul-consul-webhook-cert-manager-6866774b8b-l2mn8# kubectl delete pod/decco-consul-consul-server-0
Copy

3. Once done verify the all the cosul related services are active:

Get resource info
    
​x
    
[root@test-pf9-du-host-airgap]# kubectl get all -A | grep consuldefault                       pod/decco-consul-consul-connect-injector-f9c54d6cc-fqfqt        1/1     Running     0                71mdefault                       pod/decco-consul-consul-server-0                                1/1     Running     0                71mdefault                       pod/decco-consul-consul-webhook-cert-manager-6866774b8b-8j67w   1/1     Running     0                71m​​default                       service/decco-consul-consul-connect-injector            ClusterIP      10.21.3.14    <none>          443/TCP                                                                            2d19hdefault                       service/decco-consul-consul-dns                         ClusterIP      10.21.0.128   <none>          53/TCP,53/UDP                                                                      2d19hdefault                       service/decco-consul-consul-server                      ClusterIP      None          <none>          8500/TCP,8502/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP   25ddefault                       service/decco-consul-consul-ui                          ClusterIP      10.21.2.154   <none>          80/TCP                                                                             25d​​default                       deployment.apps/decco-consul-consul-connect-injector       1/1     1            1           2d19hdefault                       deployment.apps/decco-consul-consul-webhook-cert-manager   1/1     1            1           2d19h​​default                       replicaset.apps/decco-consul-consul-connect-injector-f9c54d6cc        1         1         1       2d19hdefault                       replicaset.apps/decco-consul-consul-webhook-cert-manager-6866774b8b   1         1         1       2d19h​​default                       statefulset.apps/decco-consul-consul-server   1/1     25d
Copy

Last updated on

Was this page helpful?

Decco-consul-server pod in Error Status

Problem

Environment

Cause

Resolution