Skip to content

Fix Modbus connection recovery for both master and slave plugins#42

Merged
thiagoralves merged 2 commits into
developmentfrom
devin/1765047212-fix-modbus-connection-recovery
Dec 7, 2025
Merged

Fix Modbus connection recovery for both master and slave plugins#42
thiagoralves merged 2 commits into
developmentfrom
devin/1765047212-fix-modbus-connection-recovery

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Dec 6, 2025

Copy link
Copy Markdown
Contributor

Fix Modbus connection recovery for both master and slave plugins

Summary

This PR fixes connection recovery issues in both the Modbus master and slave plugins that were causing communication failures after network interruptions.

Modbus Master Plugin

Root cause: The ensure_connection() method only checked client.connected (pymodbus internal socket state) but ignored the is_connected flag that was being set to False on errors. This meant the flag was essentially dead code - written but never read for reconnection decisions. When a broken pipe occurred, pymodbus might still report the socket as "connected" even though it was dead.

Fix:

  • ensure_connection() now checks BOTH is_connected AND client.connected before considering the connection healthy
  • Added mark_disconnected() method to properly signal connection errors and log the event
  • When reconnecting, explicitly close the existing client to clean up dead sockets before creating a new one
  • Updated all error handlers to use mark_disconnected() instead of directly setting the flag

Modbus Slave Plugin

Issues addressed:

  • No restart logic - if the server crashed, it stayed dead permanently
  • Cross-thread shutdown problem - asyncio.run(ServerStop()) created a new event loop instead of using the server's loop
  • Misleading startup success - start_loop() returned True before knowing if the server actually bound successfully
  • Excessive logging - every getValues/setValues call logged individual coils/registers

Fix:

  • Added automatic restart logic with exponential backoff (2s base, 30s max) - server will "never give up"
  • Fixed cross-thread shutdown by calling ServerStop() directly (it uses asyncio.run_coroutine_threadsafe internally)
  • Added proper startup success detection using threading.Event with 5-second timeout
  • Removed per-request debug logging, keeping only errors and lifecycle messages
  • Switched from StartAsyncTcpServer to ModbusTcpServer.serve_forever(background=True) for reliable bind detection

Review & Testing Checklist for Human

  • Test master connection recovery: Simulate a network interruption and verify the master reconnects automatically without broken pipe spam
  • Test slave server restart: Kill the slave server process or cause a bind failure, verify it automatically restarts with backoff
  • Test slave startup failure detection: Configure an invalid port/address and verify start_loop() returns False (not True with silent failure)
  • Test slave graceful shutdown: Call stop_loop() and verify the server stops cleanly without hanging
  • Long-running test: Run both plugins for an extended period to ensure stability

Recommended test plan:

  1. Start both Modbus master and slave plugins
  2. Verify normal communication works
  3. For master: disconnect the remote slave, observe "marked as disconnected" messages, reconnect and verify recovery
  4. For slave: cause a server error (e.g., port conflict), observe retry messages with increasing backoff, resolve the conflict and verify recovery
  5. Stop both plugins and verify clean shutdown

Notes

  • Link to Devin run: https://app.devin.ai/sessions/c0ab2357ad504b7ba3826ec9e9a16955
  • Requested by: Thiago Alves (@thiagoralves)
  • The slave plugin changes are significant (rewrite of start_loop/stop_loop). Manual testing is strongly recommended before merging.
  • Added pylint disable comments at module level for pre-existing code style issues (pymodbus API requires getValues/setValues naming).

- Fix ensure_connection() to check BOTH is_connected flag AND client.connected
- Add mark_disconnected() method to properly signal connection errors
- Close dead sockets before reconnecting to avoid broken pipe errors
- Update all error handlers to use mark_disconnected() instead of directly
  setting is_connected = False

This ensures the connection is properly torn down and re-established when
any communication error occurs (timeout, broken pipe, ModbusIOException, etc.)

Co-Authored-By: Thiago Alves <thiagoralves@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@thiagoralves thiagoralves requested a review from Copilot December 6, 2025 20:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the Modbus master plugin where network interruptions caused continuous "broken pipe" errors instead of proper reconnection. The root cause was that ensure_connection() only checked the pymodbus client state but ignored the plugin's own is_connected flag that was being set to False on errors.

Key changes:

  • Enhanced ensure_connection() to check both is_connected flag AND client.connected before considering the connection healthy
  • Added mark_disconnected() method to centralize disconnect signaling with logging
  • Updated all error handlers to use the new mark_disconnected() method

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
modbus_master_plugin.py Replaced direct flag assignments with mark_disconnected() method calls across all error handlers
modbus_master_connection.py Added mark_disconnected() method and enhanced ensure_connection() to check both connection states and properly clean up dead sockets

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from pymodbus.client import ModbusTcpClient
from pymodbus.exceptions import ConnectionException


Copilot AI Dec 6, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Unnecessary blank line added after imports. The import section already has proper spacing.

Suggested change

Copilot uses AI. Check for mistakes.
host=self.host,
port=self.port,
timeout=self.timeout
host=self.host, port=self.port, timeout=self.timeout

Copilot AI Dec 6, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The formatting change from multi-line to single-line parameters reduces readability. The previous multi-line format was clearer and more consistent with Python style guidelines.

Suggested change
host=self.host, port=self.port, timeout=self.timeout
host=self.host,
port=self.port,
timeout=self.timeout

Copilot uses AI. Check for mistakes.
- Add automatic restart logic with exponential backoff (never give up)
- Fix cross-thread shutdown: call ServerStop() directly instead of via asyncio.run()
- Add proper startup success detection using threading.Event
- Remove excessive per-request logging (keep only errors and lifecycle messages)
- Use ModbusTcpServer with serve_forever(background=True) for reliable bind detection
- Add double-start protection and improved shutdown timeout handling
- Add pylint disable comments for pre-existing code style issues

Co-Authored-By: Thiago Alves <thiagoralves@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title Fix Modbus master connection recovery after network interruption Fix Modbus connection recovery for both master and slave plugins Dec 7, 2025
@thiagoralves thiagoralves requested a review from Copilot December 7, 2025 14:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,355 +1,393 @@
# pylint: disable=C0103,C0301,C0413,W0107,W0602,W0621,C0415

Copilot AI Dec 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The pylint disable comment uses numeric codes (C0103, C0301, etc.) instead of symbolic names (invalid-name, line-too-long, etc.). While both work, symbolic names are more readable and maintainable as they make it clear what rules are being disabled without needing to look up the codes.

Suggested change
# pylint: disable=C0103,C0301,C0413,W0107,W0602,W0621,C0415
# pylint: disable=invalid-name,line-too-long,wrong-import-position,unnecessary-pass,global-variable-not-assigned,redefined-outer-name,import-outside-toplevel

Copilot uses AI. Check for mistakes.

def setValues(self, address, values):
"""Discrete inputs are read-only, this method should not be called"""

Copilot AI Dec 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] While the comment explains the intent, the method docstring states 'this method should not be called'. Consider updating the docstring to clarify that writes are silently ignored rather than being an error condition, or add a more descriptive comment explaining why silent ignoring is the desired behavior over logging a warning.

Suggested change
"""Discrete inputs are read-only, this method should not be called"""
"""Discrete inputs are read-only; writes are silently ignored.
This method is intentionally a no-op. Any attempt to write to discrete inputs
will be ignored without warning or error.
"""

Copilot uses AI. Check for mistakes.

def setValues(self, address, values):
"""Input registers are read-only, this method should not be called"""

Copilot AI Dec 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to the discrete inputs class, the method docstring states 'this method should not be called' but the implementation silently ignores writes. Consider updating the docstring to match the actual behavior of silently ignoring writes rather than treating them as an error.

Suggested change
"""Input registers are read-only, this method should not be called"""
"""Input registers are read-only; writes are silently ignored."""

Copilot uses AI. Check for mistakes.
running = False
server_loop = None # Reference to the server's event loop for cross-thread operations
server_started_event = threading.Event() # Signals successful server startup
server_error = None # Stores any startup error message

Copilot AI Dec 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global variables server_loop, server_started_event, and server_error are accessed from multiple threads without synchronization. While threading.Event is thread-safe, server_loop and server_error could experience race conditions when read/written from different threads (e.g., main thread checking server_error while server thread is setting it). Consider using a lock to protect access to server_error and server_loop, or document that the startup event serves as a memory barrier.

Suggested change
server_error = None # Stores any startup error message
server_error = None # Stores any startup error message
server_state_lock = threading.Lock() # Protects access to server_loop and server_error

Copilot uses AI. Check for mistakes.


class ModbusConnectionManager:
class ModbusConnectionManager: # pylint: disable=too-many-instance-attributes

Copilot AI Dec 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The pylint disable comment uses a symbolic name (too-many-instance-attributes) which is good practice. However, this conflicts with the approach in the modbus_slave file which uses numeric codes. Consider standardizing on symbolic names across both files for consistency and better maintainability.

Copilot uses AI. Check for mistakes.
@thiagoralves thiagoralves merged commit 0e9c03c into development Dec 7, 2025
2 checks passed
@thiagoralves thiagoralves deleted the devin/1765047212-fix-modbus-connection-recovery branch December 7, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants