-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
waiting for completion of hook and hook never succeds #6880
Comments
I have terminated the app sync and re-synced it again and the sync is successful now but this can't happen because if it happens CI / CD runs and also the automation that we have done to install apps via argoCD CLI would fail. |
I suspect this is fixed by #6294 . The fix is available in https://github.com/argoproj/argo-cd/releases/tag/v2.0.3 . Can you try upgrading please? |
sure, thanks, we recently upgraded our develop to to use 2.0.5 and this happened on our prod build which is on 2.0.1. I will see if this can repro on our dev branch. Thanks ! |
@alexmt - We are using the below version of ArgoCD and seeing the same issue with Contour helm. Application is waiting for PreSync Job to complete whereas on a cluster I can see the job is completed. { |
I have the same problem in version |
I have the same problem on the 2.3.0 RC1 as well |
The PreSync hook, PostSync hook, and "Syncing" (while No Operation Running) are the only long pending major issues in ArgoCD at the moment. |
Hello. I am still seeing this in v2.2.4. PreSync hook is scheduled, Job starts, runs to completion, Argo sits there spinning "Progressing" until terminated. To work around it, we are terminating the op and using 'sync --strategy=apply' (disabling the hook) and running our job out of band. Kube events during the sync confirm the job success. I no longer see the job/pod (per those events) if I check the namespace directly.
Let me know if I can provide any diagnostics to help. |
We face the same issue in 2.2.5 as well. |
Does it work with 2.0.3 or 2.2.2? |
I can confirm the error was fixed on 2.0.3. We recently upgraded to 2.3.3 and we are experiencing the error again. |
We started experiencing this issue after upgrading to |
We're seeing a similar issue on the syncfailed hook which means we can't actually terminate the sync action. The job doesn't exist in the target namespace, and we've tried to trick argo by creating a job with the same name, namespace, and annotations as we'd expect to see with a simple echo "done' action but nothing is helping. ArgoCD Version; {"Version":"v2.3.4+ac8b7df","BuildDate":"2022-05-18T11:41:37Z","GitCommit":"ac8b7df9467ffcc0920b826c62c4b603a7bfed24","GitTreeState":"clean","GoVersion":"go1.17.10","Compiler":"gc","Platform":"linux/amd64","KsonnetVersion":"v0.13.1","KustomizeVersion":"v4.4.1 2021-11-11T23:36:27Z","HelmVersion":"v3.8.0+gd141386","KubectlVersion":"v0.23.1","JsonnetVersion":"v0.18.0"} |
To add some information here, we are running into the same issue ("waiting for completion of hook" when the hook has already completed), and it happens when we are attempting to sync to a revision that is not the targetRevision for the app. When we sync an app with hooks to the same revision as the targetRevision, we do not run into this. expand for argo version
We are running 2 application-controller replicas in HA setup as per https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/ . I have verified we do not have a leftover instance of argo before it used stateful-sets. |
i had a similar problem when i was configuring resource inclusions, i wrote down what happened here: #10756 (comment) |
I am still seeing this with |
We resolved this symptom on v2.4.12+41f54aa for Apps that had many Pods by adding a resource exclusion along these lines to our config map:
Prior to this, we would have pre-sync job hooks never completing in the ArgoCD UI, but would have actually be completed in Kubernetes. Sometimes, invalidating the cluster cache would help Argo recognize the job was completed, but most of the time not. We believe the timeouts were related to needing to enumerate an excessive amount of entities and just simply never could finish before the next status refresh occurred. We do not utilize viewing the status of Pods through ArgoCD UI, so this solution is fine for us. Bonus factor for us is that the UI is much more robust now as well 🙂 |
We had this issue and it was relating to a customers Job failing to initialise due to a bad secret mounting. You can validate this by checking the events in the namespace the job is being spun up to see if its failing to create. |
Hello Argo community :) I am fairly familiar with ArgoCD codebase and API, and I'd happily try to repay you for building such an awesome project by trying to have a stab at this issue, if there are no objections? |
I will highly appreciate..! |
I would also highly appreciate that! |
I'm seeing this issue with |
I am also seeing this issue when installing kubevela with argocd with the version v2.6.1+3f143c9
|
We also had this issue and it was resolved once we set Instructions here: https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/#argocd-application-controller
|
I rolled back from I also tried @micke's recommendation (changing the |
In my case i am only installing the application on a single cluster. That is the only application that is failing
|
I just figured out what was causing Argo to freeze on the hook. In my case the specific hook had patches:
- target:
name: pre-hook
kind: Job
path: patches/hook.yaml # patches/hook.yaml
- path: "/spec/ttlSecondsAfterFinished"
op: remove Afterwards the chart finally went through! It's still a bug that should be addressed, I'm just sharing this for others to work around it. |
I had this problem when i had CR which CRD is still not created, and a job with Sync hook so argocd couldnt apply the custom resource because there was no crd yet, and the hook started and then disappeared. I guess because argo retries to sync the CR and it also restarts the hook somehow so I just did that the hook will be PostSync. |
Encountered similar behavior as described in this issue while upgrading from v2.5.12+9cd67b1 to v2.6.3+e05298b |
still an issue. hoping for a fix. |
Also same issue with Karpenter helm chart 3.7.1: stucked on: any workarounds you can share? |
While i agree that this argocd issue should be addressed, this is primarily due to Karpenter's (frankly rubbish) implementation. I have been battling errors in the 1.0.0 upgrade for the past two days. Warning: removing Argo from the mix doesn't make things much better. There as workarounds documented in the issues in the karpenter repo. |
Fixes an argocd issue where helm hooks never finish syncing when they have ttlSecondsAfterFinished set to 0. See related argocd issue: argoproj/argo-cd#6880 Suggested workaround as implemented by argocd team: argoproj/argo-helm#2861
Had the same issue although for me the service account was not being deleted with the same hooks. My job ttlSecondsAfterFinished was > 0. Unsure if this is the actual root issue but I added this annotations to my job/sa |
I have this problem in our pipeline where we create a kind cluster and manage ArgoCD via ArgoCD Helm chart almost every day (stuck in presync redis init job). If someone of Inuit, Redhat, Akuity,.. helps me how to debug/gather debug data I would be happy to to gather all data to troubleshoot this problem. @jessesuen @crenshaw-dev ? It seems solving this problem would help lots of people in the community and I would help as much as I can .. |
+1 |
In my case, it was solved by deleting the sync option ---
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-create
namespace: ingress-nginx
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
spec:
template:
metadata:
annotations:
karpenter.sh/do-not-disrupt: 'true'
``` |
Same issue with |
Would overriding |
No, even if ttlSecondsAfterFinished is set to a value > 0, the Job remains in an incomplete state in Argo CD. |
Hello awesome argo community, I played around with this bug and implemented a proof-of-concept PR which proposes a fix for the stuck hook issue. I kindly ask for feedback from the community and maintainers, as I would very much like to upstream it. Issue - #21055 |
@dejanzele I would like to try this in my tests, how can I make a custom build with this PR included or do you already have an Image somewhere which I can use in my tests? |
The solution @dejanzele talked about, and the one being worked on in argoproj/gitops-engine#646, is to add a finalizer to all hook resources to prevent hook resources from being deleted before Argo CD can observe the final state. I'm curious what others think of this approach. |
Hey @jkleinlercher, @crenshaw-dev had good comments about generalizing the approach (also thanks for the finalizer idea <3 ), so I am in the process of refactoring it. After that, if you'd like a preview, you can build it from my fork. |
HI,
We are seeing this issue quite often where app sync is getting stuck in "waiting for completion of hook" and these hooks are never getting completed
As you can see the below application got stuck on secret creation phase and some how that secret never got created
Stripped out all un-necessary details. Now this is how the secret is created and used by the job.
kubectl -n argocd logs argocd-server-768f46f469-j98h6 | grep xxx-migrations - No matching logs
kubectl -n argocd logs argocd-repo-server-57bdbf899c-9lxhr | grep xxx-migrations - No matching logs
kubectl -n argocd logs argocd-repo-server-57bdbf899c-7xvs7 | grep xxx-migrations - No matching logs
kubectl -n argocd logs argocd-server-768f46f469-tqp8p | grep xxx-migrations - No matching logs
[testadmin@server0 ~]$ kubectl -n argocd logs argocd-application-controller-0 | grep orchestrator-migrations
time="2021-08-02T02:16:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:16:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:19:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:19:26Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:22:17Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:22:17Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:22:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:25:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:25:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:28:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:28:26Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:31:25Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
time="2021-08-02T02:31:26Z" level=info msg="Resuming in-progress operation. phase: Running, message: waiting for completion of hook /Secret/xxx-migrations-0.0.19-private4.1784494" application=xxx
Environment:
ArgoCD Version: 2.0.1
Please let me know in case of any other info required
The text was updated successfully, but these errors were encountered: