Skip to content

API: Add approvalPolicy to TaskSpec for human-in-the-loop gating in task pipelines #816

@kelos-bot

Description

@kelos-bot

🤖 Kelos Strategist Agent @gjkim42

Problem

Kelos task pipelines (dependsOn) currently run fully autonomously — once an upstream task succeeds, all downstream dependents start immediately. There is no mechanism to pause a pipeline for human review before high-impact downstream tasks proceed.

This is a critical gap for production adoption. Real-world agent workflows frequently require human checkpoints:

  • Code review gate: Agent scaffolds a feature → human reviews the PR → agent writes tests and merges
  • Deployment gate: Agent creates a hotfix PR → human approves → agent merges and triggers deploy
  • Security gate: Agent proposes dependency upgrades → security team approves → agent applies changes across repos
  • Compliance gate: Agent generates a data migration → DBA reviews → agent executes migration

Today, the only workaround is to split these into separate, manually-triggered TaskSpawners, losing pipeline context and dependency output passing ({{.Deps}}).

Proposal

Add an approvalPolicy field to TaskSpec that causes a task to enter a new AwaitingApproval phase after the agent completes successfully, holding downstream dependents until approval is granted.

New TaskPhase

const (
    // TaskPhaseAwaitingApproval means the agent succeeded but downstream
    // dependents are blocked pending human approval.
    TaskPhaseAwaitingApproval TaskPhase = "AwaitingApproval"
)

New API Fields

// ApprovalPolicy defines how a task awaits and receives human approval
// before unblocking downstream dependents.
type ApprovalPolicy struct {
    // Mode specifies how approval is delivered.
    // "annotation" — approve by setting a kubectl annotation on the Task
    // "githubComment" — approve via a comment on the associated GitHub item
    // +kubebuilder:validation:Enum=annotation;githubComment
    // +kubebuilder:default=annotation
    Mode string `json:"mode,omitempty"`

    // ApproveCommand is the comment text that grants approval (e.g., "/approve").
    // Only used when mode is "githubComment".
    // +optional
    ApproveCommand string `json:"approveCommand,omitempty"`

    // RejectCommand is the comment text that rejects and fails the task (e.g., "/reject").
    // Only used when mode is "githubComment".
    // +optional
    RejectCommand string `json:"rejectCommand,omitempty"`

    // TimeoutSeconds is the maximum time to wait for approval before
    // auto-failing the task. Zero means wait indefinitely.
    // +optional
    // +kubebuilder:validation:Minimum=0
    TimeoutSeconds *int32 `json:"timeoutSeconds,omitempty"`
}

Added to TaskSpec:

type TaskSpec struct {
    // ... existing fields ...

    // ApprovalPolicy, when set, causes the task to enter AwaitingApproval
    // phase after the agent succeeds instead of immediately transitioning
    // to Succeeded. Downstream dependents remain blocked until approval
    // is granted. If the task fails, it transitions directly to Failed
    // (approval is only requested on success).
    // +optional
    ApprovalPolicy *ApprovalPolicy `json:"approvalPolicy,omitempty"`
}

Controller Behavior

  1. Agent completes successfully → task enters AwaitingApproval (not Succeeded)
  2. status.outputs and status.results are captured normally (available for inspection)
  3. Downstream dependsOn tasks remain in Waiting phase (existing checkDependencies already handles this — it only unblocks on Succeeded)
  4. Approval received → task transitions to Succeeded → dependents unblock
  5. Rejection received → task transitions to Failed → dependents fail with "dependency failed"

Annotation mode (default, simplest):

kubectl annotate task write-tests kelos.dev/approved=true

GitHub comment mode (for PR/issue-driven workflows):
The spawner's GitHub polling loop checks for approval comments on the associated GitHub item, similar to the existing commentPolicy mechanism.

Example: Feature Pipeline with Review Gate

# Stage 1: Agent scaffolds the feature
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: scaffold
spec:
  type: claude-code
  credentials:
    type: oauth
    secretRef:
      name: claude-credentials
  workspaceRef:
    name: my-workspace
  branch: feature/auth
  # Gate: human must approve before tests are written
  approvalPolicy:
    mode: annotation
    timeoutSeconds: 86400  # 24h timeout
  prompt: |
    Scaffold a user authentication module with login and registration endpoints.
    Create the code, commit, and push. Open a draft PR for review.
---
# Stage 2: Only runs after human approves stage 1
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: write-tests
spec:
  type: claude-code
  credentials:
    type: oauth
    secretRef:
      name: claude-credentials
  workspaceRef:
    name: my-workspace
  branch: feature/auth
  dependsOn:
    - scaffold
  prompt: |
    Write comprehensive tests for the auth module on branch
    {{index .Deps "scaffold" "Results" "branch"}}.

Example: TaskSpawner with GitHub Comment Approval

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: reviewed-fixes
spec:
  when:
    githubIssues:
      labels: ["bug", "approved-for-agent"]
  taskTemplate:
    type: claude-code
    credentials:
      type: api-key
      secretRef:
        name: anthropic-key
    workspaceRef:
      name: my-workspace
    branch: "kelos-fix-{{.Number}}"
    approvalPolicy:
      mode: githubComment
      approveCommand: "/lgtm"
      rejectCommand: "/reject"
    promptTemplate: |
      Fix the bug described in issue #{{.Number}}: {{.Title}}
      {{.Body}}
      Create a PR. A human will review and comment /lgtm to proceed.

Why This Matters

  1. Production safety: Agents can prepare changes, but humans retain control over when they're applied
  2. Incremental trust: Teams can start with approval gates on every stage, then remove them as they gain confidence in their agent workflows
  3. Compliance: Regulated industries require human sign-off before code reaches production
  4. Natural fit: Builds on existing dependsOn mechanics and checkDependencies controller logic — the controller already blocks on non-Succeeded phases, so AwaitingApproval slots in naturally
  5. Composable: Works with all existing features (TTL, branch locking, prompt templates, output passing)

Implementation Notes

  • The checkDependencies function in task_controller.go:684 already only unblocks on TaskPhaseSucceeded, so AwaitingApproval would naturally block dependents without changes to that logic
  • For annotation mode: the controller watches for annotation changes on Tasks (already reconciles on Task updates)
  • For githubComment mode: could reuse the existing GitHubCommentPolicy authorization framework (allowedUsers, allowedTeams, minimumPermission) for controlling who can approve
  • kelos get tasks should display the AwaitingApproval phase clearly, and kelos logs should hint at how to approve
  • The existing BranchLocker continues to hold the lock during AwaitingApproval so no other task modifies the branch

Alternatives Considered

/kind feature

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions