-
Notifications
You must be signed in to change notification settings - Fork 21
RDKB-63377 : Add patch for SH Logic for reap hung child #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: support/2025q4
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -313,12 +313,141 @@ self_heal_meshAgent_hung() { | |||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| # This is a workaround till fork calls are removed from t2 | ||||||||||||||||||||||||||||||||
| # Purpose of this selfheal is to kill t2 telemetry2_0 childs if it is : | ||||||||||||||||||||||||||||||||
| # 1] running for more than 120 sec | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| detect_and_kill_locked_pids() { | ||||||||||||||||||||||||||||||||
| local name="$1" THRESH="${2:-120}" | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
Comment on lines
+320
to
+322
|
||||||||||||||||||||||||||||||||
| if [ -z "$name" ]; then | ||||||||||||||||||||||||||||||||
| return 2 | ||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| local pids=$(pidof "$name") | ||||||||||||||||||||||||||||||||
| if [ -z "$pids" ]; then | ||||||||||||||||||||||||||||||||
| return 0 | ||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| local pid_count | ||||||||||||||||||||||||||||||||
| pid_count=$(set -- $pids; echo $#) | ||||||||||||||||||||||||||||||||
| if [ "$pid_count" -le 1 ]; then | ||||||||||||||||||||||||||||||||
| return 0 | ||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| echo_t "[RDKB_SELFHEAL_T2] Multiple telemetry pids are running $pids" | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| # 1. CLK_TCK (USER_HZ) & Uptime | ||||||||||||||||||||||||||||||||
| # USER_HZ is almost always 100 on Linux regardless of CONFIG_HZ | ||||||||||||||||||||||||||||||||
| local hz=1000 | ||||||||||||||||||||||||||||||||
| if [ -r /proc/config.gz ]; then | ||||||||||||||||||||||||||||||||
| local detected_hz=$(zcat /proc/config.gz 2>/dev/null | grep "^CONFIG_HZ=" | cut -d= -f2) | ||||||||||||||||||||||||||||||||
| if [ -n "$detected_hz" ]; then | ||||||||||||||||||||||||||||||||
| hz=$detected_hz | ||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||
|
Comment on lines
+342
to
+348
|
||||||||||||||||||||||||||||||||
| local hz=1000 | |
| if [ -r /proc/config.gz ]; then | |
| local detected_hz=$(zcat /proc/config.gz 2>/dev/null | grep "^CONFIG_HZ=" | cut -d= -f2) | |
| if [ -n "$detected_hz" ]; then | |
| hz=$detected_hz | |
| fi | |
| fi | |
| local hz | |
| if command -v getconf >/dev/null 2>&1; then | |
| hz=$(getconf CLK_TCK 2>/dev/null) | |
| fi | |
| # Fallback to a safe default if getconf is unavailable or returns an invalid value | |
| if ! echo "$hz" | grep -Eq '^[0-9]+$'; then | |
| hz=100 | |
| fi |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "multi-threaded" parent heuristic runs ls | wc -l for each PID, which is relatively expensive and can be avoided by reading the Threads: field from /proc/<pid>/status (or similar) without spawning multiple processes. This will reduce overhead in the periodic self-heal loop.
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The per-PID sleep 5/sleep 1 inside the loop can block the health monitor for a long time if multiple child PIDs exist (e.g., N children => ~6N seconds). Consider a bounded overall timeout, shorter waits, or reaping in a way that doesn't delay the rest of self_heal_t2() for extended periods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "childs"; use "children" for correct grammar (also in the following line where the purpose is described).