Fix Modbus connection recovery for both master and slave plugins#42
Conversation
- Fix ensure_connection() to check BOTH is_connected flag AND client.connected - Add mark_disconnected() method to properly signal connection errors - Close dead sockets before reconnecting to avoid broken pipe errors - Update all error handlers to use mark_disconnected() instead of directly setting is_connected = False This ensures the connection is properly torn down and re-established when any communication error occurs (timeout, broken pipe, ModbusIOException, etc.) Co-Authored-By: Thiago Alves <thiagoralves@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the Modbus master plugin where network interruptions caused continuous "broken pipe" errors instead of proper reconnection. The root cause was that ensure_connection() only checked the pymodbus client state but ignored the plugin's own is_connected flag that was being set to False on errors.
Key changes:
- Enhanced
ensure_connection()to check bothis_connectedflag ANDclient.connectedbefore considering the connection healthy - Added
mark_disconnected()method to centralize disconnect signaling with logging - Updated all error handlers to use the new
mark_disconnected()method
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| modbus_master_plugin.py | Replaced direct flag assignments with mark_disconnected() method calls across all error handlers |
| modbus_master_connection.py | Added mark_disconnected() method and enhanced ensure_connection() to check both connection states and properly clean up dead sockets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from pymodbus.client import ModbusTcpClient | ||
| from pymodbus.exceptions import ConnectionException | ||
|
|
||
|
|
There was a problem hiding this comment.
[nitpick] Unnecessary blank line added after imports. The import section already has proper spacing.
| host=self.host, | ||
| port=self.port, | ||
| timeout=self.timeout | ||
| host=self.host, port=self.port, timeout=self.timeout |
There was a problem hiding this comment.
[nitpick] The formatting change from multi-line to single-line parameters reduces readability. The previous multi-line format was clearer and more consistent with Python style guidelines.
| host=self.host, port=self.port, timeout=self.timeout | |
| host=self.host, | |
| port=self.port, | |
| timeout=self.timeout |
- Add automatic restart logic with exponential backoff (never give up) - Fix cross-thread shutdown: call ServerStop() directly instead of via asyncio.run() - Add proper startup success detection using threading.Event - Remove excessive per-request logging (keep only errors and lifecycle messages) - Use ModbusTcpServer with serve_forever(background=True) for reliable bind detection - Add double-start protection and improved shutdown timeout handling - Add pylint disable comments for pre-existing code style issues Co-Authored-By: Thiago Alves <thiagoralves@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,355 +1,393 @@ | |||
| # pylint: disable=C0103,C0301,C0413,W0107,W0602,W0621,C0415 | |||
There was a problem hiding this comment.
[nitpick] The pylint disable comment uses numeric codes (C0103, C0301, etc.) instead of symbolic names (invalid-name, line-too-long, etc.). While both work, symbolic names are more readable and maintainable as they make it clear what rules are being disabled without needing to look up the codes.
| # pylint: disable=C0103,C0301,C0413,W0107,W0602,W0621,C0415 | |
| # pylint: disable=invalid-name,line-too-long,wrong-import-position,unnecessary-pass,global-variable-not-assigned,redefined-outer-name,import-outside-toplevel |
|
|
||
| def setValues(self, address, values): | ||
| """Discrete inputs are read-only, this method should not be called""" |
There was a problem hiding this comment.
[nitpick] While the comment explains the intent, the method docstring states 'this method should not be called'. Consider updating the docstring to clarify that writes are silently ignored rather than being an error condition, or add a more descriptive comment explaining why silent ignoring is the desired behavior over logging a warning.
| """Discrete inputs are read-only, this method should not be called""" | |
| """Discrete inputs are read-only; writes are silently ignored. | |
| This method is intentionally a no-op. Any attempt to write to discrete inputs | |
| will be ignored without warning or error. | |
| """ |
|
|
||
| def setValues(self, address, values): | ||
| """Input registers are read-only, this method should not be called""" |
There was a problem hiding this comment.
[nitpick] Similar to the discrete inputs class, the method docstring states 'this method should not be called' but the implementation silently ignores writes. Consider updating the docstring to match the actual behavior of silently ignoring writes rather than treating them as an error.
| """Input registers are read-only, this method should not be called""" | |
| """Input registers are read-only; writes are silently ignored.""" |
| running = False | ||
| server_loop = None # Reference to the server's event loop for cross-thread operations | ||
| server_started_event = threading.Event() # Signals successful server startup | ||
| server_error = None # Stores any startup error message |
There was a problem hiding this comment.
The global variables server_loop, server_started_event, and server_error are accessed from multiple threads without synchronization. While threading.Event is thread-safe, server_loop and server_error could experience race conditions when read/written from different threads (e.g., main thread checking server_error while server thread is setting it). Consider using a lock to protect access to server_error and server_loop, or document that the startup event serves as a memory barrier.
| server_error = None # Stores any startup error message | |
| server_error = None # Stores any startup error message | |
| server_state_lock = threading.Lock() # Protects access to server_loop and server_error |
|
|
||
|
|
||
| class ModbusConnectionManager: | ||
| class ModbusConnectionManager: # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
[nitpick] The pylint disable comment uses a symbolic name (too-many-instance-attributes) which is good practice. However, this conflicts with the approach in the modbus_slave file which uses numeric codes. Consider standardizing on symbolic names across both files for consistency and better maintainability.
Fix Modbus connection recovery for both master and slave plugins
Summary
This PR fixes connection recovery issues in both the Modbus master and slave plugins that were causing communication failures after network interruptions.
Modbus Master Plugin
Root cause: The
ensure_connection()method only checkedclient.connected(pymodbus internal socket state) but ignored theis_connectedflag that was being set toFalseon errors. This meant the flag was essentially dead code - written but never read for reconnection decisions. When a broken pipe occurred, pymodbus might still report the socket as "connected" even though it was dead.Fix:
ensure_connection()now checks BOTHis_connectedANDclient.connectedbefore considering the connection healthymark_disconnected()method to properly signal connection errors and log the eventmark_disconnected()instead of directly setting the flagModbus Slave Plugin
Issues addressed:
asyncio.run(ServerStop())created a new event loop instead of using the server's loopstart_loop()returnedTruebefore knowing if the server actually bound successfullygetValues/setValuescall logged individual coils/registersFix:
ServerStop()directly (it usesasyncio.run_coroutine_threadsafeinternally)threading.Eventwith 5-second timeoutStartAsyncTcpServertoModbusTcpServer.serve_forever(background=True)for reliable bind detectionReview & Testing Checklist for Human
start_loop()returnsFalse(notTruewith silent failure)stop_loop()and verify the server stops cleanly without hangingRecommended test plan:
Notes
start_loop/stop_loop). Manual testing is strongly recommended before merging.getValues/setValuesnaming).