RDKB-63116: Force kill stale dibbler-client process and prevent zombi… by S-Parthiban-Selvaraj · Pull Request #36 · rdkcentral/miscellaneous-broadband

S-Parthiban-Selvaraj · 2026-01-14T14:58:15Z

…e processes

Problem

When dibbler-client fails to terminate gracefully with SIGTERM within the timeout period, the process remains running. Subsequent attempts to start a new dibbler-client instance detect the stale process but only return its PID without killing it. This causes the new instance to fail to start while the old zombie process persists.

Solution

Implemented two-stage process termination strategy:

Graceful Termination (SIGTERM): First attempt to terminate the process gracefully to allow it to clean up resources. Wait for completion with configurable timeout.
Forceful Termination (SIGKILL): If SIGTERM fails to terminate the process within the timeout, send SIGKILL to forcefully terminate it.
Proper Process Reaping: Use collect_waiting_process() after SIGKILL to properly reap the process using waitpid(). This prevents zombie processes from accumulating in the process table.

Changes Made

dhcpv6_client_utils.c (start_dhcpv6_client)

When an existing dibbler-client instance is detected, forcefully kill it with SIGKILL
Call collect_waiting_process() to properly reap the killed process
Reset pid to 0 and continue to start a fresh instance

dibbler_client_utils.c (stop_dibbler)

When SIGTERM timeout expires, send SIGKILL as fallback
Replace usleep() + stat() check with collect_waiting_process()
Proper zombie process handling ensures clean termination

Scenarios Handled

Stale dibbler-client process blocking new instance startup
Process termination timeout recovery
Zombie process prevention
Interface state transitions without stale process interference

…e processes ## Problem When dibbler-client fails to terminate gracefully with SIGTERM within the timeout period, the process remains running. Subsequent attempts to start a new dibbler-client instance detect the stale process but only return its PID without killing it. This causes the new instance to fail to start while the old zombie process persists. ## Solution Implemented two-stage process termination strategy: 1. **Graceful Termination (SIGTERM)**: First attempt to terminate the process gracefully to allow it to clean up resources. Wait for completion with configurable timeout. 2. **Forceful Termination (SIGKILL)**: If SIGTERM fails to terminate the process within the timeout, send SIGKILL to forcefully terminate it. 3. **Proper Process Reaping**: Use collect_waiting_process() after SIGKILL to properly reap the process using waitpid(). This prevents zombie processes from accumulating in the process table. ## Changes Made ### dhcpv6_client_utils.c (start_dhcpv6_client) - When an existing dibbler-client instance is detected, forcefully kill it with SIGKILL - Call collect_waiting_process() to properly reap the killed process - Reset pid to 0 and continue to start a fresh instance ### dibbler_client_utils.c (stop_dibbler) - When SIGTERM timeout expires, send SIGKILL as fallback - Replace usleep() + stat() check with collect_waiting_process() - Proper zombie process handling ensures clean termination ## Scenarios Handled - Stale dibbler-client process blocking new instance startup - Process termination timeout recovery - Zombie process prevention - Interface state transitions without stale process interference

Copilot

Pull request overview

This PR implements a robust two-stage process termination strategy for dibbler-client to prevent stale and zombie processes. When dibbler-client fails to terminate gracefully with SIGTERM, the code now sends SIGKILL and properly reaps the process to prevent zombies. The PR also forces termination of any existing dibbler-client instance before starting a new one.

Changes:

Enhanced process termination with SIGKILL fallback when SIGTERM times out
Added proper zombie process reaping using collect_waiting_process() after SIGKILL
Modified start_dhcpv6_client to forcefully kill existing instances before starting new ones
Updated function documentation to reflect new behavior and return values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
source/dhcp_client_utils/dibbler_client_utils.c	Added SIGKILL fallback and zombie reaping when SIGTERM timeout expires in stop_dibbler()
source/dhcp_client_utils/dhcpv6_client_utils.c	Modified start_dhcpv6_client() to forcefully kill existing instances and updated documentation for both start/stop functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T16:31:39Z

source/dhcp_client_utils/dhcpv6_client_utils.c

        DBG_PRINT("%s %d: Fail to open sysevent.\n", __FUNCTION__, __LINE__);
 	/* CID 189993 Improper use of negative value */
-	return FAILURE;
+	return 0;


The return value change from FAILURE to 0 creates an inconsistency with the function's documented behavior and other failure paths. The function documentation states it returns 'pid (>0)...or 0 on failure', but this change makes it inconsistent with line 119 which still returns FAILURE for the same error condition (existing instance detection). Either both failure cases should return 0, or the documentation needs to clarify that different failure modes have different return codes.

Copilot · 2026-01-20T16:31:40Z

source/dhcp_client_utils/dhcpv6_client_utils.c

+            /* Collect the killed process to prevent zombie */
+            if (collect_waiting_process(pid, 1000) != SUCCESS)
+            {
+                DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid);


When collect_waiting_process() fails after SIGKILL, the code continues execution and attempts to start a new dibbler-client instance. This could lead to resource conflicts if the old process wasn't properly cleaned up. Consider returning a failure code or adding additional verification that the process is truly terminated before proceeding.

Suggested change

DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid);

DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid);

/* Failed to confirm process termination; do not start a new client */

return 0;

Copilot · 2026-01-20T16:31:40Z

source/dhcp_client_utils/dibbler_client_utils.c

+        if (collect_waiting_process(pid, 1000) != SUCCESS)
+        {
+            DBG_PRINT("%s %d: unable to collect pid %d after SIGKILL\n", __FUNCTION__, __LINE__, pid);
+            return FAILURE;
+        }


The timeout value of 1000 milliseconds for collect_waiting_process() is hardcoded in multiple locations. Consider defining this as a constant (e.g., PROCESS_COLLECT_TIMEOUT_MS) to improve maintainability and ensure consistency across the codebase.

guto86 approved these changes Jan 14, 2026

View reviewed changes

LakshminarayananShenbagaraj approved these changes Jan 15, 2026

View reviewed changes

S-Parthiban-Selvaraj marked this pull request as ready for review January 20, 2026 16:30

S-Parthiban-Selvaraj requested review from a team as code owners January 20, 2026 16:30

Copilot AI review requested due to automatic review settings January 20, 2026 16:30

Copilot AI reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDKB-63116: Force kill stale dibbler-client process and prevent zombi…#36

RDKB-63116: Force kill stale dibbler-client process and prevent zombi…#36
S-Parthiban-Selvaraj wants to merge 1 commit intodevelopfrom
bugfix/dibbler-restart-conflict

S-Parthiban-Selvaraj commented Jan 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

S-Parthiban-Selvaraj commented Jan 14, 2026

Problem

Solution

Changes Made

dhcpv6_client_utils.c (start_dhcpv6_client)

dibbler_client_utils.c (stop_dibbler)

Scenarios Handled

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants