Node Disconnected From Management Plane Due To Hostagent Certificate expiry.

Problem

Node disconnected from the management plane. Found below errors in comms.log and hosagent.log .

[2024-03-23 04:13:11.086] [DEBUG] sni-prometheus.v0.example.platform9.net-::1-9118-24 - New client socket details - local port:  9118 , remote port:  44162
[2024-03-23 04:13:11.101] [ERROR] sni-broker.v0.example.platform9.net-::1-5672-4 - TLS socket for client 4610 error: Error: 139771737151360:error:14094415:SSL routines:ssl3_read_bytes:sslv3
alert certificate expired:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1544:SSL alert number 45

[2024-03-23 04:13:11.101] [DEBUG] sni-broker.v0.example.platform9.net-::1-5672-4 - Server socket for client 4610 closed normally ... remaining: 0
[2024-03-23 04:13:11.102] [DEBUG] sni-broker.v0.example.platform9.net-::1-5672-4 - Client 4610 socket closed normally ... remaining: 0
[2024-03-23 04:13:11.241] [DEBUG] sni-prometheus.v0.example.platform9.net-::1-9118-24 - CONNECT via proxy 10.42.25.62:8083 to example.platform9.net:443 succeeded.
[2024-03-23 04:13:11.241] [INFO] sni-prometheus.v0.example.platform9.net-::1-9118-24 - Server socket for client 4611 established, numServers: 332
[2024-03-23 04:13:11.245] [DEBUG] sni-prometheus.v0.example.platform9.net-::1-9118-24 - CONNECT via proxy 10.42.25.62:8083 to example.platform9.net:443 succeeded.
[2024-03-23 04:13:11.245] [INFO] sni-prometheus.v0.example.platform9.net-::1-9118-24 - Server socket for client 4612 established, numServers: 333
[2024-03-23 04:13:11.743] [ERROR] sni-prometheus.v0.example.platform9.net-::1-9118-24 - TLS socket for client 4253 error: Error: Client network socket disconnected before secure TLS connection
2024-03-23 04:13:32,770 - session.py INFO - Already converged. Idling...
2024-03-23 04:13:32,770 - session.py WARNING - Not sending status message because channel is closed
2024-03-23 04:13:32,770 - session.py INFO - Using the default virtual host '/' on the AMQP broker localhost
2024-03-23 04:13:33,410 - slave.py ERROR - Connection error. Retrying in 10 seconds.
Traceback (most recent call last):
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/bbslave/slave.py", line 127, in reconnect_loop
    start(config, log, app_db, agent_app_db, app_cache,
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/bbslave/session.py", line 770, in start
    dual_channel_io_loop(log,
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/bbcommon/amqp.py", line 245, in dual_channel_io_loop
    conn.ioloop.start()
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/select_connection.py", line 461, in start
    self._poller.start()
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/select_connection.py", line 721, in start
    self.poll()
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/select_connection.py", line 1114, in poll
    self._dispatch_fd_events(fd_event_map)
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/select_connection.py", line 831, in _dispatch_fd_events
    handler(fileno, events)
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/base_connection.py", line 410, in _handle_events
    self._handle_read()
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/adapters/base_connection.py", line 460, in _handle_read
    return self._on_terminate(
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/connection.py", line 2119, in _on_terminate
    self.callbacks.process(0,
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/callback.py", line 60, in wrapper
    return function(*tuple(args), **kwargs)
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/callback.py", line 92, in wrapper
    return function(*args, **kwargs)
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/callback.py", line 236, in process
    callback(*args, **keywords)
  File "/opt/pf9/hostagent/lib/python3.9/site-packages/pika/connection.py", line 1856, in _on_connection_error
    raise exceptions.AMQPConnectionError(error_message or
pika.exceptions.AMQPConnectionError: (-1, 'EOF')

Environment

  • Platform9 Managed Kubernetes - v5.6.8 or higher

Cause

The hostagent certificate within the nodes have been expired.

Resolution

  1. Replace existing certificate files from etc/pf9/certs/hostagent/ directory.

  2. Restart the pf9 hostagent service.

Additional Information

This is a known issue. An internal Jira CORE-1304 is already filed to track this issue. Please open a support ticket to know the progress.

Last updated