This repo helps to reproduce the problem reported in argoproj/argo-cd#10707
In the steps below ArgoCD is deployed to a local kind cluster. You need the kind binary for this Get it here: https://github.com/kubernetes-sigs/kind/releases/
Clone this repo and enter the local repository:
git clone https://github.com/vx-github/vx-argocd-cert-bug.git
cd vx-argocd-cert-bug
Start the kind cluster by running the script:
./01_create-kind-cluster.sh
Deploy ArgoCD v2.4.18
./02_deploy-argocd.sh v2.4.18
or Deploy ArgoCD v2.5.5
./02_deploy-argocd.sh v2.5.5
Wait for ArgoCD to get ready on: https://localhost:30000/
Now we want to check what cert is served by ArgoCD. Run this script in a second terminal:
./03_view_argocd_cert.sh
Every second some properties of the cert served by ArgoCD are printed. At first it will be a Selfsigned Cert by ArgoCD because the kubernetes secret argocd-server-tls
does not exists yet.
In the first terminal start the below script:
./04_update-test-certs.sh
This script will create argocd-server-tls
after 60 seconds. After another 60 seconds it will update argocd-server-tls
with a new cert. It will do this 5 times.
Pay attention to the second terminal. When 04_generate-apply-test-certs.sh
is run the output should show the updated cert, but you don't have to wait long to see the ArgoCD selfsigned certificate again that was generated by ArgoCD being served.
If 04_generate-apply-test-certs.sh
generates a second cert (number 2), it will not take long to see old certs (generated selfsigned cert from ArgoCD and cert 'number 1'). And so on, after every update, older certs are still show up
This test is not a real for how ArgoCD is used ofcourse, but the same happends when cert-manager is used and periodically argocd-server-tls
is updated. Old (expired) certs popup and create unavailability because the same cert is used when argocd components communicate with each other and they don't accept expired certs.
In ./results
some example output can be found for ArgoCD v2.5.2, v2.4.17 and v2.4.2.
When a (new) cert was placed in the kubernetes secret argocd-server-tls
(updated at 2022-11-09 00:42:45) argocd-server will read the cert and uses it (view at 2022-11-09 00:42:48).
But not long after it will serve the older selfsigned cert again (view at 2022-11-09 00:43:00). And somewhat later it serves the new 'number 1' again (view at 2022-11-09 00:43:01).
At 2022-11-09 00:43:46 (update) the 'number 2' cert is generated and applied. Argocd-server doesn't start to serve the 'number 2' right away and at 2022-11-09 00:44:46 'number 3' is created and served at 2022-11-09 00:45:06. At 2022-11-09 00:45:25 cert 'number 2' pops up...
This instability keeps ongoing until argocd-server is restarted (new pod instance).
When the script 4_generate-apply-test-certs.sh
finishes, cert in argocd-server-tls
is:
❯ k get secrets argocd-server-tls -o json | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -noout -subject -dates -serial
subject=C = NL, O = Argo CD Cert Test number 5, CN = localhost
notBefore=Nov 8 23:32:34 2022 GMT
notAfter=Nov 18 23:32:34 2022 GMT
serial=1C629A48F40B9F4345A0D03D2E82874DEF8CDE7F
... but argocd-server stil serves older certs: selfsigned cert from ArgoCD and the 'number 2'cert (view from 2022-11-09 00:46:47)
Another observation is that it takes much time for the new cert is used by argocd-server (unlike a cert update in v2.4.2 where this is almost instant).
Above instable cert serving by argocd-server was not seen in v2.4.2, as can be seen in the view output and update output. When ever a new cert is generated and applied (for example at 2022-11-09 00:16:39) it is served almost instant by argocd-server (at 2022-11-09 00:16:40) and older certs are never served anymore.
Everytime a new test was started, the current kind cluster was deleted with:
kind delete cluster