Skip to content

🐛 Fix: Handle empty chart directory correctly #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jklippel
Copy link
Contributor

Currently the return of helmChartNamePath is not checked on errors. This means that the CSO tries to helm template the container which leads to cryptic error messages from helm.

This commit adds a check on errors on the call to helmChartNamePath and results in more helpful error messages by helm.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #288

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

The log message after the patch is more helpful:

{"level":"ERROR","time":"2025-04-14T15:28:57.491Z","file":"controller/controller.go:324","message":"Reconciler error","controller":"clusterstackrelease","controllerGroup":"clusterstack.x-k8s.io","controllerKind":"ClusterStackRelease","ClusterStackRelease":{"name":"docker-example-1-32-v0-sha-uxh53fj","namespace":"default"},"namespace":"default","name":"docker-example-1-32-v0-sha-uxh53fj","reconcileID":"584d729e-a19f-4dbd-9c1d-5a45df7edc21","error":"failed to template and apply: failed to template clusterClass helm chart: failed to template clusterClass helm chart: helm chart matching the name filter docker-example-1-32-cluster-class-v0-sha.uxh53fj not found in /tmp/downloads/cluster-stacks/docker-example-1-32-v0-sha-uxh53fj","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/jakl/GolandProjects/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/jakl/GolandProjects/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/jakl/GolandProjects/cluster-stack-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}
{"level":"INFO","time":"2025-04-14T15:30:13.992Z","file":"fake/client.go:58","message":"WARNING: called DownloadReleaseAssets of fake assets client","controller":"clusterstackrelease","controllerGroup":"clusterstack.x-k8s.io","controllerKind":"ClusterStackRelease","ClusterStackRelease":{"name":"docker-ferrol-1-27-v2","namespace":"cluster"},"namespace":"cluster","name":"docker-ferrol-1-27-v2","reconcileID":"24134593-9a09-4a19-b8dd-0b5f92e09ee6"}

TODOs:

  • squash commits
  • include documentation
  • add unit tests

Copy link
Member

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!

We currently have a function CheckHelmCharts that should take care of this. I believe that we forgot to update it for the case of multi-stage cluster addons. I think the proper way of handling this would be to update that function in a way that it does its job.

Can you maybe look into this?

@matofeder matofeder requested a review from chess-knight April 15, 2025 09:10
@chess-knight
Copy link
Member

We currently have a function CheckHelmCharts that should take care of this. I believe that we forgot to update it for the case of multi-stage cluster addons. I think the proper way of handling this would be to update that function in a way that it does its job.

@janiskemper if I see it correctly, CheckHelmCharts is used only in clusteraddon controller right now, but this PR handles errors in the case of clusterclass

@janiskemper
Copy link
Member

@chess-knight true, my bad. However, my point is still the same. We have this function on purpose to check that everything is correct. Apparently, it doesn't do its job. We should fix that function rather than propagating the error in the other function

@jklippel jklippel force-pushed the fix/cs-apply-fails-with-no-error branch from e80d1a3 to 732b6fa Compare April 16, 2025 12:08
@jklippel
Copy link
Contributor Author

@janiskemper I added a call to helmChartNamePath to the validation method. This will show if something is wrong with the name in the helm chart. (Basically it is the same call that raises the error in this case.)

However I would like to keep the propagation of errors, because it is better to get an error message if it is already available than to drop it on purpose. If there are still some unexpected errors this will definitely help debuging.

@janiskemper
Copy link
Member

There is the function CheckHelmCharts that has the exact same purpose that you have in mind. Can you plz check how you can add it to that function? I don't mind if you additionally return it with the other function, that should just not be needed anymore

@jklippel jklippel force-pushed the fix/cs-apply-fails-with-no-error branch from 732b6fa to a5cd643 Compare April 16, 2025 13:37
@jklippel
Copy link
Contributor Author

ok, moved the validation call to CheckHelmCharts().
Unfortunately that did not solve the issue as the CheckHelmCharts() does not get called prior to the original error.
I added a call to CheckHelmCharts() to the Validate() method. Now things work as expected.
Is that what you had in mind @janiskemper ?

@janiskemper
Copy link
Member

alright! Thanks for the info. Then the problem is actually that the function was not called in the clusterstackrelease controller. We have this in the clusteraddon-controller:

	// Check for helm charts in the release assets. If they are not present, then something went wrong.
	if err := releaseAsset.CheckHelmCharts(); err != nil {
		msg := fmt.Sprintf("failed to validate helm charts: %s", err.Error())
		conditions.MarkFalse(
			clusterAddon,
			csov1alpha1.ClusterStackReleaseAssetsReadyCondition,
			csov1alpha1.IssueWithReleaseAssetsReason,
			clusterv1.ConditionSeverityError,
			"%s", msg,
		)
		record.Warn(clusterAddon, "ValidateHelmChartFailed", msg)
		return reconcile.Result{}, nil
	}
	```
	
	I think we should add that to respective place in the clusterstackrelease controller. WDYT?

@jklippel
Copy link
Contributor Author

yes, I add it here pkg/release/release.go:215 to the validate method.
Isn't that the correct place?

@janiskemper
Copy link
Member

My point was that we actually don't call the function in the first place in the clusterstackrelease controller. We should do that however. So can you add the above code snippet to the clusterstackrelease controller?

@jklippel jklippel force-pushed the fix/cs-apply-fails-with-no-error branch from a5cd643 to 092eb1c Compare April 17, 2025 14:46
@jklippel
Copy link
Contributor Author

ok, moved to internal/controller/clusterstackrelease_controller.go:155
That indeed seems to be the better place for this.
Thanks for the snippet!

Copy link
Member

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jklippel
Copy link
Contributor Author

The test Test Code / Test Code (pull_request) fails as the clusterstack used does not have a cluster-addon-values.yaml.

If this patch is merged then this will no longer be optional.
Would that be correct or should having a cluster-addon-values.yaml stay optional?

@janiskemper
Copy link
Member

Oh I didn't realize. That file is definitely optional. The more recent versions (using multi-stage cluster addons), don't have that file AFAIK

@@ -159,6 +159,11 @@ func ensureMetadata(downloadPath, metadataFileName string) (Metadata, error) {
func (r *Release) CheckHelmCharts() error {
// check if the cluster class chart is present.
clusterClassChartName := r.clusterClassChartName()

if _, err := r.ClusterClassChartPath(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the same as the one below?

Copy link
Contributor Author

@jklippel jklippel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the same as the one below?

yes, indeed; I removed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some refactoring the tests are green now and everything looks good to go again.

@jklippel jklippel force-pushed the fix/cs-apply-fails-with-no-error branch 5 times, most recently from 3e58172 to 65a4f6d Compare April 25, 2025 14:35
Currently the return of helmChartNamePath is not checked on errors.
This means that the CSO tries to helm template the container which leads to cryptic error messages from helm.

This commit adds a check to the clusterstackrelease_controller.

Signed-off-by: Jan Klippel <[email protected]>
@jklippel jklippel force-pushed the fix/cs-apply-fails-with-no-error branch from 65a4f6d to c12de9b Compare April 25, 2025 14:44
@jklippel jklippel requested a review from janiskemper April 25, 2025 14:50
Copy link
Member

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR already is an improvement. However, we removed validation around the cluster-addon-values.yaml file. We could add it back by manually checking that the file exists before this line here:

shouldRequeue, err := r.templateAndApplyClusterAddonHelmChart(ctx, in)

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cso tries to helm template the root of the container if the computed chart name does not match the charts name
3 participants