RDKB-63116: Force kill stale dibbler-client process and prevent zombi…#36
RDKB-63116: Force kill stale dibbler-client process and prevent zombi…#36S-Parthiban-Selvaraj wants to merge 1 commit intodevelopfrom
Conversation
…e processes ## Problem When dibbler-client fails to terminate gracefully with SIGTERM within the timeout period, the process remains running. Subsequent attempts to start a new dibbler-client instance detect the stale process but only return its PID without killing it. This causes the new instance to fail to start while the old zombie process persists. ## Solution Implemented two-stage process termination strategy: 1. **Graceful Termination (SIGTERM)**: First attempt to terminate the process gracefully to allow it to clean up resources. Wait for completion with configurable timeout. 2. **Forceful Termination (SIGKILL)**: If SIGTERM fails to terminate the process within the timeout, send SIGKILL to forcefully terminate it. 3. **Proper Process Reaping**: Use collect_waiting_process() after SIGKILL to properly reap the process using waitpid(). This prevents zombie processes from accumulating in the process table. ## Changes Made ### dhcpv6_client_utils.c (start_dhcpv6_client) - When an existing dibbler-client instance is detected, forcefully kill it with SIGKILL - Call collect_waiting_process() to properly reap the killed process - Reset pid to 0 and continue to start a fresh instance ### dibbler_client_utils.c (stop_dibbler) - When SIGTERM timeout expires, send SIGKILL as fallback - Replace usleep() + stat() check with collect_waiting_process() - Proper zombie process handling ensures clean termination ## Scenarios Handled - Stale dibbler-client process blocking new instance startup - Process termination timeout recovery - Zombie process prevention - Interface state transitions without stale process interference
There was a problem hiding this comment.
Pull request overview
This PR implements a robust two-stage process termination strategy for dibbler-client to prevent stale and zombie processes. When dibbler-client fails to terminate gracefully with SIGTERM, the code now sends SIGKILL and properly reaps the process to prevent zombies. The PR also forces termination of any existing dibbler-client instance before starting a new one.
Changes:
- Enhanced process termination with SIGKILL fallback when SIGTERM times out
- Added proper zombie process reaping using collect_waiting_process() after SIGKILL
- Modified start_dhcpv6_client to forcefully kill existing instances before starting new ones
- Updated function documentation to reflect new behavior and return values
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| source/dhcp_client_utils/dibbler_client_utils.c | Added SIGKILL fallback and zombie reaping when SIGTERM timeout expires in stop_dibbler() |
| source/dhcp_client_utils/dhcpv6_client_utils.c | Modified start_dhcpv6_client() to forcefully kill existing instances and updated documentation for both start/stop functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| DBG_PRINT("%s %d: Fail to open sysevent.\n", __FUNCTION__, __LINE__); | ||
| /* CID 189993 Improper use of negative value */ | ||
| return FAILURE; | ||
| return 0; |
There was a problem hiding this comment.
The return value change from FAILURE to 0 creates an inconsistency with the function's documented behavior and other failure paths. The function documentation states it returns 'pid (>0)...or 0 on failure', but this change makes it inconsistent with line 119 which still returns FAILURE for the same error condition (existing instance detection). Either both failure cases should return 0, or the documentation needs to clarify that different failure modes have different return codes.
| /* Collect the killed process to prevent zombie */ | ||
| if (collect_waiting_process(pid, 1000) != SUCCESS) | ||
| { | ||
| DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid); |
There was a problem hiding this comment.
When collect_waiting_process() fails after SIGKILL, the code continues execution and attempts to start a new dibbler-client instance. This could lead to resource conflicts if the old process wasn't properly cleaned up. Consider returning a failure code or adding additional verification that the process is truly terminated before proceeding.
| DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid); | |
| DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid); | |
| /* Failed to confirm process termination; do not start a new client */ | |
| return 0; |
| if (collect_waiting_process(pid, 1000) != SUCCESS) | ||
| { | ||
| DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid); | ||
| return FAILURE; | ||
| } |
There was a problem hiding this comment.
The timeout value of 1000 milliseconds for collect_waiting_process() is hardcoded in multiple locations. Consider defining this as a constant (e.g., PROCESS_COLLECT_TIMEOUT_MS) to improve maintainability and ensure consistency across the codebase.
…e processes
Problem
When dibbler-client fails to terminate gracefully with SIGTERM within the timeout period, the process remains running. Subsequent attempts to start a new dibbler-client instance detect the stale process but only return its PID without killing it. This causes the new instance to fail to start while the old zombie process persists.
Solution
Implemented two-stage process termination strategy:
Graceful Termination (SIGTERM): First attempt to terminate the process gracefully to allow it to clean up resources. Wait for completion with configurable timeout.
Forceful Termination (SIGKILL): If SIGTERM fails to terminate the process within the timeout, send SIGKILL to forcefully terminate it.
Proper Process Reaping: Use collect_waiting_process() after SIGKILL to properly reap the process using waitpid(). This prevents zombie processes from accumulating in the process table.
Changes Made
dhcpv6_client_utils.c (start_dhcpv6_client)
dibbler_client_utils.c (stop_dibbler)
Scenarios Handled