Frequent DB Connection Errors in Designate Logs
Problem
Frequent db connection errors in the designate-mdns logs on the host with dns/designate role.
ERROR oslo_db.sqlalchemy.engines [req-927ce395-6d19-4caf-8d70-829d69003c28 - - - - -] Database connection was found disconnected; reconnecting: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
ERROR oslo_db.sqlalchemy.engines [req-b2bd48e0-2a89-464d-84b3-9d2eefbfdae5 - - - - -] Database connection was found disconnected; reconnecting: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
Environment
- Platform9 Managed OpenStack - 5.8.0.
- Host having Designate role applied.
Cause
When the Designate query communication with MySQL is using UDP instead of TCP, it is expected to see the Lost connection to MySQL server during query
errors when the packets are missed as per the UDP protocol.
Using UDP protocol for the Designate to MySQL communication will not affect the overall transmission.
Answer
To avoid the connection errors seen due to the packet losses as part of the UDP protocol, It is recommended to use TCP protocol, as it guarantee the successful communication- hence no "Lost connection" errors in the desinate logs.
If the connection pooling is used, make sure the connection lifetime (or timeout, etc) is less than the configured wait_timeout
.
To enable TCP protocol for communication add all_tcp = true
in the designate config file. The timeout value can be mentioned in the designate configuration file as mentioned below:
[service:mdns]
tcp_recv_timeout = 10 # Timeout value
all_tcp = true # To enable TCP protocol
listen = 0.0.0.0:5354
Additional Information
The error "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
are observed when the connections have been idle for longer than the database server's wait_timeout
setting.
A solution to this is to set sqlalchemy's pool_recycle
time to (significantly) less than the database's wait_timeout
and use optimistic disconnect handling.
For more details, please refer the official documentation- Connection Pooling — SQLAlchemy 2.0 Documentation