Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions src/edge-node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ const nodeId = process.env.EDGEMESH_NODE_ID ?? `node-${Math.random().toString(36
const bootstrapToken = process.env.EDGEMESH_BOOTSTRAP_TOKEN ?? "bootstrap-dev";
const heartbeatMs = Number(process.env.EDGEMESH_HEARTBEAT_MS ?? 3000);
const pollMs = Number(process.env.EDGEMESH_POLL_MS ?? 1500);
const MAX_BACKOFF_MS = Number(process.env.EDGEMESH_MAX_BACKOFF_MS ?? 30_000);

let nodeJwt: string | null = null;

Expand Down Expand Up @@ -156,6 +157,10 @@ async function main() {
});
}, heartbeatMs);

// Exponential backoff state for error handling
let backoffMs = pollMs;
let consecutiveErrors = 0;

while (true) {
try {
const task = await claimTask();
Expand All @@ -168,14 +173,29 @@ async function main() {
const result = await executeTask(task);
await submitResult(result);
console.log(`[edge-node:${nodeId}] task completed`, task.taskId, result.ok ? "ok" : "failed");

// Reset backoff on successful task execution
backoffMs = pollMs;
consecutiveErrors = 0;
Comment on lines +177 to +179
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset backoff after any successful poll

The new backoff state is only reset after a task is fully executed, so successful claimTask() polls that return no task do not clear backoffMs/consecutiveErrors. After a temporary outage, if the node spends time idling with empty queues, the next isolated error is still treated like a long error streak and can sleep up to MAX_BACKOFF_MS, which delays recovery and reduces polling responsiveness even though many successful polls occurred in between.

Useful? React with πŸ‘Β / πŸ‘Ž.

} catch (err) {
consecutiveErrors++;

if (String(err).includes("HTTP 401")) {
console.warn(`[edge-node:${nodeId}] 401 on task loop, re-authenticating`);
await reAuth().catch((e) => console.error(`[edge-node:${nodeId}] re-auth failed`, e));
} else {
console.error(`[edge-node:${nodeId}] loop error`, err);
console.error(`[edge-node:${nodeId}] loop error (attempt ${consecutiveErrors})`, err);
}
await sleep(pollMs);

// Exponential backoff with jitter to prevent thundering herd
const jitter = Math.random() * 1000; // 0-1000ms jitter
const nextBackoff = Math.min(backoffMs * 2, MAX_BACKOFF_MS);
const sleepTime = Math.min(backoffMs + jitter, MAX_BACKOFF_MS);

console.log(`[edge-node:${nodeId}] backing off for ${Math.round(sleepTime)}ms`);
await sleep(sleepTime);

backoffMs = nextBackoff;
}
}
}
Expand Down
Loading