Make batch workloads first-class#1098
Conversation
1b07139 to
fc263f3
Compare
2c036f6 to
0521535
Compare
0521535 to
0cdd8c3
Compare
0cdd8c3 to
3e3d80f
Compare
3e3d80f to
2e130c8
Compare
2e130c8 to
617df7c
Compare
| phase = "Succeeded" | ||
| case job.Status.Failed > 0: | ||
| phase = "Failed" | ||
| } |
There was a problem hiding this comment.
Job run phase misclassified
Medium Severity
jobRunInfo treats status.succeeded or status.failed pod counts as terminal success or failure even when JobComplete / JobFailed conditions are not true. Retrying or partially complete CronJob runs can show Failed/Succeeded and active: false while the Job controller is still running.
Reviewed by Cursor Bugbot for commit 617df7c. Configure here.
| if !shouldWaitForPodsInLogStream(kind, metadata) { | ||
| sendSSEEvent(w, flusher, "end", workloadLogEndPayload(metadata)) | ||
| return | ||
| } |
There was a problem hiding this comment.
SSE ends on empty pods
Medium Severity
During workload log streaming, rediscovery now ends the SSE stream whenever no pods match and shouldWaitForPodsInLogStream is false. Deployments and StatefulSets can briefly have zero matching pods during rollouts or scale events, so live tails stop instead of reconnecting when pods return.
Reviewed by Cursor Bugbot for commit 617df7c. Configure here.
| if job.Spec.Selector == nil { | ||
| return nil, fmt.Errorf("job %s/%s has no pod selector", namespace, name) | ||
| } | ||
| return job.Spec.Selector, nil |
There was a problem hiding this comment.
Job logs require pod selector
Medium Severity
Job workload logs resolve pods only via job.spec.selector and error when it is nil. Pods for a Job are routinely labeled with batch.kubernetes.io/job-name (as elsewhere in this repo for hook Jobs), so logs can fail while kubectl logs job/... still works.
Reviewed by Cursor Bugbot for commit 617df7c. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
There are 5 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c353074. Configure here.
|
|
||
| export function pickDefaultRun(runs: WorkloadRun[]): WorkloadRun | undefined { | ||
| return runs.find((run) => run.active) ?? newestRun(runs) | ||
| } |
There was a problem hiding this comment.
Default run ignores failures
Medium Severity
When no run is active, pickDefaultRun picks the newest run by timestamp only. That disagrees with /runs, which sorts failed/error runs ahead of newer successes, so scheduled log viewers and batch execution UI can open succeeded runs while a retained failure still exists.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c353074. Configure here.
| b.LastScheduledAt = run.ScheduledAt | ||
| } | ||
| b.Message = run.Message | ||
| } |
There was a problem hiding this comment.
Batch latest run by time
Medium Severity
applyRunToBatch sets latestRunPhase from the newest timestamp among retained runs, not using the same active-then-failed-then-newest policy as sortRuns. A newer success can hide an older retained failure in Applications health and batch chips.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit c353074. Configure here.


Summary
Jobs, CronJobs, Argo Workflows, and CronWorkflows are now treated as first-class batch workloads across Radar, not just as generic resources with pod logs bolted on.
This keeps the earlier run-aware logs work and adds the broader product surface:
Design Notes
Testing
npx tsc --noEmitfromweb/go test ./internal/server(cd pkg && go test ./topology)git diff --checkmake buildVisual Testing
Ran against real clusters:
radar-test-nonprod, namespaceradar-batch-visual: native Jobs/CronJobs including running, failed, completed, retained history, suspended/no-run schedule, Applications, and topology.gke_koalabackend_us-east1-b_nonprod-cluster-us-east1: Argo Workflow list/fullscreen and empty CronWorkflow list.Artifacts are under
.playwright-mcp/visual-test/20260704-171002/, including the final post-fix captures:job-running-fullscreen-1920-fixed.pngcronjob-runs-fullscreen-1920-fixed.pngFresh console check after the final build: 0 errors, 0 warnings. Only debug/log messages remained.
Known Gap
No reachable cluster had a live CronWorkflow instance, so CronWorkflow instance rendering is implemented and type-checked but visually covered only at the empty-list level. I did not create a CronWorkflow in nonprod because that can trigger controller-created Workflows.
Note
Medium Risk
Touches log streaming, RBAC on new Argo/batch APIs, and applications/topology aggregation; behavior is mostly additive but incorrect run or selector logic could surface wrong pods or empty logs.
Overview
Jobs and Argo Workflows can use the same aggregated workload-log path as Deployments, with pod selectors resolved from Job specs or Argo’s
workflows.argoproj.io/workflowlabel. MCPget_workload_logsand docs acceptjob/workflowkinds. When pods are missing, responses includeemptyReason, guidance, and kubectl/argo command hints—including terminal finished runs and archive-log awareness for Workflows.Scheduled batch gets
GET /workloads/.../runsfor CronJobs (owned Jobs) and CronWorkflows (labeled Workflows), returning normalizedWorkloadRunobjects with phases, progress, and pod/step counts. The web app adds run pickers (ScheduledWorkloadLogsViewer,BatchExecutionView), shows batch signals on Applications, and wires logs/topology/resources for Workflow/CronWorkflow kinds.Applications API now ingests standalone Jobs, Workflows, and CronWorkflows with optional
batchsummaries rolled up from child runs; topology adds Workflow/CronWorkflow nodes and pod ownership edges.Reviewed by Cursor Bugbot for commit c353074. Bugbot is set up for automated code reviews on this repo. Configure here.