Skip to content

shardTaskPatches bypasses type validation and is inconsistent with TaskTemplate.process.args schema which leads deletion failure in such BatchSandbox instance #1019

@vernon-h

Description

@vernon-h

Background

BatchSandbox CRD defines ProcessTask defines command and args as string arrays and defines ShardTaskPatches as []runtime.RawExtension.

type ProcessTask struct {
    Command []string `json:"command"`
    Args    []string `json:"args,omitempty"`
    Env     []corev1.EnvVar `json:"env,omitempty"`
}

// ...
// ...
ShardTaskPatches []runtime.RawExtension `json:"shardTaskPatches,omitempty"`

The batch sandbox example CR contains:

  taskTemplate:
    spec:
      process:
        command:
        - sleep
        args:
        - infinite
  # ...
  shardTaskPatches:
  - spec:
      process:
        args:
        - 3600
  # ...

Problem

shardTaskPatches bypasses CRD type validation

shardTaskPatches as []runtime.RawExtension accepts arbitrary JSON/YAML payloads, Kubernetes cannot enforce the same schema validation rules as those applied to TaskSpec.

As a result, the API server accept the following parameter without any vefirication:

shardTaskPatches:
- spec:
    process:
      args:
      - 3600

Even though the indirect target field is ultimately:

Args []string

The type mismatch is only discovered later when the controller decodes or merges the patch into a TaskTemplateSpec. This creates inconsistent validation behavior:

taskTemplate.spec.process.args
    -> validated as []string

shardTaskPatches[].spec.process.args
    -> accepts arbitrary JSON values

Expected Behavior

Make shardTaskPatches schema-aware and validate against TaskSpec. This would ensure invalid payloads such as:

args:
- 3600

are rejected during CR admission instead of failing later during reconciliation.

Follow-up Problems

Failure in Deletiong of CR Based on Finalizer Mechanism

Every bacth sandbox CR is inserted with field finalizer for cascading resource protection. In above situation, that will cause huge risk - failed to delete CR.

The flow of batch sandbox interacting with finalizer:

Insert label Finalizer into BatchSandbox1 // pass, because no shardTaskPatches with json.Marshal()
  ⬇ 
Merge shardTaskPatches into task spec // fail, due to json.Marshal(shardTaskPatches)
  ⬇
Delete BatchSandbox1 // fail, because label Finalizer is never cleared
  ⬇
BatchSandbox1 is spinning in terminating

We have to clear label Finalizer manually, and then resource with marked deletetimestamp will be deleted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions