Host Interfaces Using 'enic' Drivers Become Disabled and Lose Connectivity

Problem

  • A Platform9 Managed host shows as offline from Clarity UI.

  • Connectivity errors are observed as shown below.

pf9_app.py INFO - Setting the desired service state
pf9_app.py INFO - Setting service state pf9-comms.3.9.0-663.baf294f. Command: sudo systemctl start pf9-comms
session.py INFO - Converge succeeded
amqp.py WARNING - Connection closed due to Not specified, retrying in 10 seconds
session.py ERROR - Connection closed unexpectedly.
slave.py ERROR - Connection error. Retrying in 10 seconds.
Traceback (most recent call last):
  File "/opt/pf9/hostagent/lib/python2.7/site-packages/bbslave/slave.py", line 83, in reconnect_loop
    channel_retry_period=retry_period)
  File "/opt/pf9/hostagent/lib/python2.7/site-packages/bbslave/session.py", line 716, in start
    raise AMQPConnectionError
AMQPConnectionError upstream: failed to connect to RabbitMQ: Exception (501) Reason: "read tcp 127.0.0.1:41514->127.0.0.1:5672: i/o timeout"
  • The enic devcmd timed out messages are reported in /var/log/messages on the host.

kernel: [21479642.626977] enic 0000:06:00.0 enp6s0: devcmd 4 timed out
kernel: [21479642.727633] enic 0000:07:00.0 enp7s0: devcmd 4 timed out
...
kernel: [21479650.951488] enic 0000:07:00.0 enp7s0: devcmd2 4: wq is full. fetch index: 17, posted index:16
kernel: [21479650.952179] enic 0000:0c:00.0 enp12s0: devcmd2 4: wq is full. fetch index: 8, posted index: 7
...
kernel: [21489854.868177] enic 0000:0c:00.0 enp12s0: devcmd2 4: wq is full. fetch index: 8, posted index: 7
kernel: [21489854.869338] enic 0000:07:00.0 enp7s0: devcmd2 4: wq is full. fetch index: 17, posted index:16

Environment

  • Platform9 Managed OpenStack - All Versions

  • Platform9 Managed Kubernetes - All Versions

  • CentOS Linux 7.4

  • enic driver - v2.3.0.31

Cause

The enic module sends a command to the NIC firmware and does not receive a timely response. NIC firmware hangs or in a bad state.

Resolution

Work with the hardware vendor to investigate this issue further and upgrade firmware to the latest version recommended by the hardware vendor.

Additional Information

RedHat: Why did interfaces using 'enic' drivers get disabled with errors and lose connectivity?arrow-up-right

Last updated