pf9-consul Service Logs Utilising More Diskspace Impacting the Workloads

Problem

Observing diskspace exhaustion in the Hypervisors by pf9-consul logs with errors flooding as below:

pf9-consul.log
    
2024/07/18 02:13:37 [ERR] consul.rpc: failed to accept RPC conn: accept tcp 16.228.38.14:8300: accept4: too many open files2024/07/18 02:13:37 [ERR] memberlist: Error accepting TCP connection: accept tcp 16.228.38.14:8301: accept4: too many open files
Copy

Environment

Platform9 Managed OpenStack - v5.8.2 and higher.

Answer

The errors are observed because the consul user does not have enough File Descriptor value to perform its operations. The workaround is to increase the file descriptor values.

The current soft limit is set to 1024, and the hard limit is 4096 which is the maximum allowed. Consul has already used up these available File Descriptors, which is causing it to dump continuous error logs.

The Consul File Descriptor limit needs to be set two times higher than the expected number of clients in the cluster.

To fix this, it is required to increase the File Descriptor limit.

Increase the default File Descriptor limit per user using the following steps.

Modify /etc/security/limits.conf.
Add the following lines to /etc/security/limits.conf to set the file descriptor limits for all users.

/limits.conf
    
 
* soft nofile 65536* hard nofile 65536
Copy

If the file /etc/security/limits.conf is managed by Chef and local changes are reverted, then the customer needs to adjust the config with Chef accordingly.

Additional Information

Platform9 team has opened a Jira IAAS-10787 to track this issue, and the mentioned changes will be reflected in the PMO-5.10.X release; ETA is by the end of September 2024.
For more details, refer the official documentation: https://developer.hashicorp.com/consul/docs/architecture/scale

Last updated on

Was this page helpful?

pf9-consul Service Logs Utilising More Diskspace Impacting the Workloads

Problem

Environment

Answer

Workaround

Additional Information