feat: add IaC security scanning with Checkov#6
feat: add IaC security scanning with Checkov#6moghit-eou wants to merge 5 commits intoMedical-Informatics-Platform:mainfrom
Conversation
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
|
Nice PR, thank you again @moghit-eou . |
|
Hi @moghit-eou, Now for the following, what was the logic behind excluding them fully? Is there an option for exceptions instead maybe?
Finally, Checkov now only scans this repo and not the dependencies it pulls. While this is a simple approach that can be remediated by adding checkov to other MIP repositories, what happens for other dependencies? I have to say that I don't yet have a solution for this but this is something we will need to think about and that is not blocking for this PR> |
|
Hi @jdaln, Hope the conference went well. On the Submariner failures: thanks for the context, good to know those should be cleaned up with the version upgrade. On the global skips: My initial approach was too broad. I used those three skips as a temporary way to unblock the first scan, but I intended to narrow them down properly. After a quick reviewing the repo and the relevant Checkov references, here is what I found: CKV_K8S_8 / CKV_K8S_9 are workload checks for liveness and readiness probes. In this repo, they are not meaningful on ArgoCD control resources such as CKV_K8S_35 is also a workload-level check: it evaluates how running containers consume secrets. For the same reason, it is not meaningful on ArgoCD control resources that do not define runtime containers or environment-variable based secret injection. The full scope of what this check targets is documented in the Checkov Kubernetes policy index. On exceptions instead of global skips: Checkov supports inline suppression at the individual resource level (suppression docs), so my proposed fix is to remove the global # checkov:skip=CKV_K8S_8: ArgoCD control resource - no runtime containers defined
# checkov:skip=CKV_K8S_9: ArgoCD control resource - no runtime containers defined
# checkov:skip=CKV_K8S_35: ArgoCD control resource - no runtime containers defined
apiVersion: argoproj.io/v1alpha1
kind: Application
...This way, real workload manifests such as On dependency scanning: you are right that Checkov currently only scans this repository, not the external Helm charts it references. I agree that is a separate gap worth tracking, and adding checks upstream in the MIP-related repositories seems like the cleanest direction. Does that approach make sense? Sorry, I can't provide more detailed approach I have an university exam this week so I have not had the chance to fully work through the implementation yet but I will take a closer look once that is done and update the PR in the weekend. |
|
@moghit-eou it did, thank you! And I have some takeaways in terms of runtime security that will be interesting to look at I believe :) . This approach is excellent. Let's wait for Google to communicate the results for the GSoC at the end of the month before moving further for now I would say. Good luck for your exams! |
|
Hi @jdaln, I went back and tested everything properly locally using Checkov. What I got wrong
Example: Checkov skipping ArgoCD resourcesoutput : This means my earlier claim that "CKV_K8S_8/9 are not meaningful on ArgoCD control resources" was only partially right, those checks do not fire on pure ArgoCD What I ran locally1. Full repository scancheckov -d . > all_checkov..txt 2.
|
| File | Failures |
|---|---|
patch-argocd-application-controller-resources.yaml |
16 |
patch-argocd-dex-server-resources.yaml |
16 |
patch-argocd-redis-ha-statefulset.yaml |
16 |
patch-argocd-repo-server-resources.yaml |
16 |
patch-argocd-server-resources.yaml |
16 |
patch-argocd-cmd-params-cm.yaml |
1 |
patch-argocd-cm.yaml |
1 |
patch-argocd-application-controller-clusterrole.yaml |
1 |
| Total number | 83 |
Some of the top failing rules relate to probes, image configuration, container security hardening, and RBAC permissions.
These files are Kustomize StrategicMergePatches, listed under
patchesStrategicMerge in argo-setup/patches/kustomization.yaml.
When Checkov scans these standalones files ,I think it may be treating each
file as a complete, standalone Deployment definition and flagging
everything that is "missing" (no probes, no security context, no image
tag, no namespace, etc.). That is likely why these 5 files each produce 16
failures.
The changed after kustomize build on argo-setup folder.
Overall: failures dropped from 83 (argo-setup standalone) → 69
(rendered output).
This could suggest that the probes for these components are defined in
the upstream ArgoCD HA manifest and only become visible once the patches
are merged into it, which would make the original failures on the raw
patch fragments false positives. Though I want to be careful here; I have
not confirmed this by reading the upstream manifest directly, so I may be
wrong about the exact reason.
For CKV_K8S_8 and CKV_K8S_9 specifically:
Before rendering, all five resources from the patch files failed these two
checks. After rendering, three of them now pass:
Deployment.argocd-mip-team.argocd-repo-server,
Deployment.argocd-mip-team.argocd-server, and
Deployment.argocd-mip-team.argocd-redis-ha-haproxy.
For CKV_K8S_35 (prefer secrets as files over environment variables):
this rule passes on every file in the overall scan (
all_checkov.txt — 0 failures, 9 passes total). But after kustomize build and
scanning the rendered output, it fails on 5 upstream ArgoCD resources.
Findings outside argo-setup (from all_checkov.txt)
The remaining 71 failures in the full repository scan come from:
| File | Failures | Notes |
|---|---|---|
common/nginx-ingress/test-ingress.yaml |
23 | Test deployment, no probes, no resource limits, default namespace and likely genuine findings on a non-production file |
deployments/.../exareme2-deployment-patch.yaml |
16 | Another Kustomize patch file , similar situation to argo-setup patches, may be false positives |
common/submariner/broker/copy-secret-hook.yaml |
16 | A Job resource missing resource limits, security context, etc. could be genuine |
common/nginx-ingress/manifests/nginx-public-deployment.yaml |
12 | Real workload , some findings may be genuinely worth addressing |
common/submariner/operator/kustomization.yaml |
1 | CKV_SECRET_6 , Base64 high entropy string (IPSec PSK) |
.github/workflows/lint-yaml.yml |
2 | CKV2_GHA_1 (missing top-level permissions) and CKV_GHA_2 (shell injection risk on git push) |
.github/workflows/check-main-revisions.yml |
1 | CKV2_GHA_1 (missing top-level permissions) |
I have not yet done a full analysis of each of these. Before touching
them, I want to understand which are genuine findings worth addressing and
which may be similar false positives caused by scanning partial Kustomize
documents.
Proposed changes to the workflow
-
Given all of the above,I currently think makes sense
is Replacing the globalskip_checkwithskip_path: argo-setup/patches, seems like a reasonable next step ,
though I am not fully certain this is the right approach yet. The idea
is to scope the exclusion to where the false positives are most likely
occurring -
This scopes the exclusion to where the false positives are actually
occurring the patch fragment files rather than suppressing those rules
globally across the entire repo. -
Before touching findings, I'll try to set up
actto test workflow changes locally
end-to-end.
Attached scan outputs
I uploaded the three raw scan outputs attached below / linked here for
reference. They contain the full check-by-check results that informed
everything above:
all_checkov.txt— on the full repositoryargo-setup_checkov.txt— on the argo-setup folder onlykustomize_patches_checkov.txt— After rendering through Kustomize
What this does
Adds a Checkov workflow that scans all Kubernetes, Helm, and Kustomize
files on every PR for security misconfigurations.
Known limitation ( security gap ): skip_check
CKV_K8S_8,CKV_K8S_9,CKV_K8S_35are globally skipped to avoid false positives on ArgoCDwhich are not Kubernetes workloads and have no containers. You can find all the Checkov's rules in this link
I'm still working through the codebase to identify exactly which ArgoCD files need to be modified and I will address this in a follow up PR.
soft_fail: true
mip-infra has never been scanned before that why the first run will surface existing
findings in current files. soft_fail=true makes issues visible without
blocking all open PRs immediately.