Skip to content

Conversation

@alicefr
Copy link
Contributor

@alicefr alicefr commented Aug 7, 2025

Design proposal for the booting process

@alicefr alicefr force-pushed the design-boot branch 2 times, most recently from e2118f3 to df3ef39 Compare August 7, 2025 08:52
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds great! Thanks

@travier
Copy link
Member

travier commented Aug 7, 2025

Does that mean that we will need two IP/URL for the trustee servers (KBS1 & 2)? How does the node know to go the the second one after the attestation on first boot?

We will also need to figure out how we can tie a connection to a node ID for the second boot (that's not a new problem, we already had this).

@alicefr
Copy link
Contributor Author

alicefr commented Aug 7, 2025

Does that mean that we will need two IP/URL for the trustee servers (KBS1 & 2)? H

I think we can do it in 2 ways, either with 2 IPs and KBS services or the KBS can interpret the constant path for the firstboot secret and trigger the phase for the key generation. However, for the latter, it isn't implemented by trustee so we will need to add this on top of the kbs offered by trustee.

how does the node know to go the the second one after the attestation on first boot?

Since the plan is to implement the clevis pin we could specify the 2 URL as 2 different parameters for the pin. We will register the second URL for the standard KBS to contact for second boot.

We will also need to figure out how we can tie a connection to a node ID for the second boot (that's not a new problem, we already had this).

This needs to be figure out at the end of the first phase when the LUKs key needs to be registered in KBS1 and be matched with a machine object. The IP of the HTTP request is always a valid option, as it was already discussed in the past. The only issue is that the address in the Machine object is optional, and depends on the provider. At least for KubeVirt, Azure and GCP, they set the field. For providers which don't support this, the operator will need to look for this information somewhere else.

As far as it regards the provider ID, this is a mandatory field in the Machine object but it isn't necessarily present inside the node. So, my guess is that we will need to rely on the IP of the connection and then somehow match it with what is available in the environment.

@alicefr
Copy link
Contributor Author

alicefr commented Aug 7, 2025

As far as it regards providers which don't set the IP in the Machine, we would need to use their API. In any case, the fallback plan to the Machine object was to use each provider APIs, so this won't be different then the case without the Machine object. It will be nice to check if each provider can somehow advertise if they expose the IP address without the need of checking into the code of each provider. I will ask the cluster api community if there is such an option.

The list of capi providers is quite large, but we are interested only on those who support confidential computing.

@alicefr alicefr force-pushed the design-boot branch 2 times, most recently from c8983d2 to ce15552 Compare August 7, 2025 12:07
@alicefr
Copy link
Contributor Author

alicefr commented Aug 8, 2025

/hold wait until we clarify the node identifier, this might change the design

@alicefr alicefr marked this pull request as draft August 8, 2025 07:15
@alicefr alicefr force-pushed the design-boot branch 2 times, most recently from f274adc to fd1cc0c Compare August 15, 2025 07:32
@uril
Copy link

uril commented Aug 26, 2025

This registration service is only needed if we can not get the node id from OpenShift

@travier
Copy link
Member

travier commented Sep 4, 2025

Untitled-2025-09-04-1306

Alternative if we can figure out how to tie a ressource request to the TPM AK from the attestation request.

@alicefr alicefr force-pushed the design-boot branch 2 times, most recently from 9f79757 to bb29fb4 Compare September 10, 2025 13:16
@alicefr alicefr marked this pull request as ready for review September 10, 2025 13:18
@alicefr alicefr requested review from travier and uril September 10, 2025 13:18
@alicefr
Copy link
Contributor Author

alicefr commented Sep 10, 2025

/cc @Jakob-Naucke

@alicefr alicefr force-pushed the design-boot branch 4 times, most recently from e9bba47 to 1880fa7 Compare October 2, 2025 15:48
@alicefr
Copy link
Contributor Author

alicefr commented Oct 6, 2025

@travier @Jakob-Naucke I'd like to merge this since we have already more or less implemented.

Jakob-Naucke
Jakob-Naucke previously approved these changes Oct 6, 2025
@Jakob-Naucke Jakob-Naucke dismissed their stale review October 6, 2025 13:23

initdata format incompatibilities

@alicefr alicefr requested a review from Jakob-Naucke October 10, 2025 06:24
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits but LGTM.

@alicefr
Copy link
Contributor Author

alicefr commented Oct 21, 2025

@Jakob-Naucke @travier PTAL

@travier
Copy link
Member

travier commented Oct 21, 2025

I pushed a few nits. LGTM.

@travier travier merged commit a625da1 into trusted-execution-clusters:main Oct 21, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants