Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow got stuck due to condition expression evaluation error #14031

Closed
3 of 4 tasks
jingkkkkai opened this issue Dec 25, 2024 · 0 comments · Fixed by #14032
Closed
3 of 4 tasks

Workflow got stuck due to condition expression evaluation error #14031

jingkkkkai opened this issue Dec 25, 2024 · 0 comments · Fixed by #14032
Labels
area/controller Controller issues, panics area/hooks type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@jingkkkkai
Copy link
Contributor

jingkkkkai commented Dec 25, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

Due to #13986 , we are trying to upgrade our argo to v3.6.2
After upgrading, our workflows which would be successful in v3.4.8 are stuck in v3.6.2
And there is no any error logs either in workflow-controller and argoexec
截圖 2024-12-25 晚上8 10 58

after some investigations, I found the issue is occurred in the condition expression

hooks:
  exit:
    template: on-exit
    expression: steps["prepare"].outputs != null

and the lib used for expression validation has been changed in both v3.6.2 and v3.5.13

# v3.4.8
"github.com/antonmedv/expr"

# v3.5.13, v3.6.2
"github.com/expr-lang/expr"

the new exp validator cannot recognize null and return an error.
(null and nil can be recognized in old validator.)
(the error didn't logged to workflow-controller)

err: unknown name null (1:29)\n | steps[\"prepare\"].outputs != null\n | ............................^

and the step-template handler just marked the node as completed = false, not error

# argo-workflows/workflow/controller/exit_handler.go
...
if exitHook.Expression != "" {
	execute, err = argoexpr.EvalBool(exitHook.Expression, env.GetFuncMap(template.EnvMap(woc.globalParams.Merge(scope.getParameters()))))
	if err != nil {
		return true, nil, err
	}
}

---

# argo-workflows/workflow/controller/steps.go
...
if childNode.Completed() {
    hasOnExitNode, onExitNode, err := woc.runOnExitNode(ctx, step.GetExitHook(woc.execWf.Spec.Arguments), childNode, stepsCtx.boundaryID, stepsCtx.tmplCtx, "steps."+step.Name, stepsCtx.scope)
    if hasOnExitNode && (onExitNode == nil || !onExitNode.Fulfilled() || err != nil) {
        // The onExit node is either not complete or has errored out, return.
        completed = false
    }
}

if !completed {
	return node, nil
}

I think that's what makes the workflow stuck
Can anyone help with this issue?

Version(s)

v3.5.13, v3.6.2

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: invalid-expression
spec:
  arguments:
    parameters:
    - name: message
      value: "prepare message"
  entrypoint: main
  templates:
    - name: main
      inputs:
        parameters:
          - name: message
      steps:
        - - name: prepare
            when: '{{=inputs.parameters.message != ""}}'
            template: prepare
            arguments:
              parameters:
                - name: message
                  value: '{{inputs.parameters.message}}'
        - - name: execute-script
            template: execute-target-script
            hooks:
              exit:
                template: on-exit
                expression: steps["prepare"].outputs != null
                arguments:
                  parameters:
                    - name: message
                      value: '{{steps.prepare.outputs.parameters.message}}'
    - name: execute-target-script
      script:
        image: 'python:alpine3.6'
        command: [ 'python' ]
        source: 'print("hello argo")'
    - name: prepare
      inputs:
        parameters:
          - name: message
      outputs:
        parameters:
          - name: message
            valueFrom:
              path: /tmp/message
      script:
        image: debian:9.4
        command: [ bash ]
        source: |
          echo '{{inputs.parameters.message}}'
          echo '{{inputs.parameters.message}}' > /tmp/message
    - name: on-exit
      inputs:
        parameters:
          - name: message
      script:
        image: debian:9.4
        command: [ bash ]
        source: |
          echo '{{inputs.parameters.message}}'

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/hooks type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
2 participants