Host Interfaces Using 'enic' Drivers Become Disabled and Lose Connectivity
Problem
- A Platform9 Managed host shows as offline from Clarity UI.
- Connectivity errors are observed as shown below.
pf9_app.py INFO - Setting the desired service state
pf9_app.py INFO - Setting service state pf9-comms.3.9.0-663.baf294f. Command: sudo systemctl start pf9-comms
session.py INFO - Converge succeeded
amqp.py WARNING - Connection closed due to Not specified, retrying in 10 seconds
session.py ERROR - Connection closed unexpectedly.
slave.py ERROR - Connection error. Retrying in 10 seconds.
Traceback (most recent call last):
File "/opt/pf9/hostagent/lib/python2.7/site-packages/bbslave/slave.py", line 83, in reconnect_loop
channel_retry_period=retry_period)
File "/opt/pf9/hostagent/lib/python2.7/site-packages/bbslave/session.py", line 716, in start
raise AMQPConnectionError
AMQPConnectionError upstream: failed to connect to RabbitMQ: Exception (501) Reason: "read tcp 127.0.0.1:41514->127.0.0.1:5672: i/o timeout"
- The enic devcmd timed out messages are reported in /var/log/messages on the host.
kernel: [21479642.626977] enic 0000:06:00.0 enp6s0: devcmd 4 timed out
kernel: [21479642.727633] enic 0000:07:00.0 enp7s0: devcmd 4 timed out
...
kernel: [21479650.951488] enic 0000:07:00.0 enp7s0: devcmd2 4: wq is full. fetch index: 17, posted index:16
kernel: [21479650.952179] enic 0000:0c:00.0 enp12s0: devcmd2 4: wq is full. fetch index: 8, posted index: 7
...
kernel: [21489854.868177] enic 0000:0c:00.0 enp12s0: devcmd2 4: wq is full. fetch index: 8, posted index: 7
kernel: [21489854.869338] enic 0000:07:00.0 enp7s0: devcmd2 4: wq is full. fetch index: 17, posted index:16
Environment
- Platform9 Managed OpenStack - All Versions
- Platform9 Managed Kubernetes - All Versions
- CentOS Linux 7.4
- enic driver - v2.3.0.31
Cause
The enic module sends a command to the NIC firmware and does not receive a timely response. NIC firmware hangs or in a bad state.
Resolution
Work with the hardware vendor to investigate this issue further and upgrade firmware to the latest version recommended by the hardware vendor.
Additional Information
RedHat: Why did interfaces using 'enic' drivers get disabled with errors and lose connectivity?
Was this page helpful?