Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEE Boot Attestation proposal #108

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tylerfanelli
Copy link

@tylerfanelli tylerfanelli commented Jun 1, 2024

Implements #107

@tylerfanelli tylerfanelli changed the title Add proposal for TEE Boot Attestation TEE Boot Attestation proposal Jun 1, 2024
TEE attestation. Information is dependant on the specific TEE architecture
the guest is running on top of (ARM CCA, Intel TDX, AMD SEV-SNP, etc...).
* SVSM sends all attestation information to TEE handler.
* TEE handler uses information to verify the boot environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of data can we expect here? Is this going to be different per TEE architecture? Is this sill under design?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A portion of this data will be architecture-dependent and a portion will not be.

The portion that will be arch-dependent is the attestation evidence itself. This is because each TEE's attestation report from its secure processor is formatted differently and contains different information. Different types of X509 certificates or related items may also be included based on the architecture.

The portion that will not be arch-dependent is the public key to encrypt the secret with. Since a client like SVSM runs in guest VM firmware, a network stack is not yet available to it. As such, we must use the host for networking. However, for privacy purposes we cannot allow the host to read the attestation secret. As such, SVSM will create a private key in guest (encrypted) memory, give the host the public components of the key, and expect keylime to encrypt the secret upon a successful attestation. This would allow the guest VM to gain access to the resource without the host being able to also decrypt it.

eventually initializing and integrity monitoring continuing as normal.
* SVSM will extend system's TPM state to account for TEE boot attestation.
*This is intentially vague at the moment, as the exact design is still under
ongoing discussion*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, which PCRs do you think will be affected? Or how early/late in the boot process is this happening?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still under design. Much of the "how to attest" portion is completed, but "what does the verifier return, and how is the vTPM extended from what is returned" is still under ongoing discussion. The ultimate goal is "including" TEE boot attestation within the vTPM measurements.

However, SVSM will complete all of this attestation before Linux is able to boot, as the "secret" in most cases will be a LUKS passphrase to an encrypted disk image for our kernel/OS image. For example, we will begin from SVSM with the SNP/TDX launch measurement, that covers the CVM platform, CVM config, CPU state, and firmware code, where firmware is the combination of SVSM and OVMF.

By validating all of this, we prove integrity of everything that runs prior to executing code from the guest root disk. Assuming that the OVMF binary is built with SecureBoot force enabled, and with suitable certificates, then we know that OVMF will have validated shim, and shim will have validated the UKI kernel. So by attesting the SNP/TDX launch masurement, we've established trust all the way to the start of guest userspace.

Since, we've encrypted the disk image, we can trust that it's what is running after attestation. We can trust the guest userspace, as that's coming out of an encrypted disk image, that we can only unlock if we've passed attestation such that some keys are released.


The TEE handler will contain a very sensitive secret for unlocking some state
on the CVM. This secret must be properly protected and should only be made
available to a guest that has successfully attested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper owership and mode bits of the DB should be sufficient or do we need encrypted columns now (if available)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think encrypted columns are necessary, as long as the right access permission (that is, only successfully attested guests can read a secret) are in place.


Upon a `POST` request to the TEE handler, the handler will deserialize the
request body from JSON to fetch the parameters listed above. The handler will
read the `tee` parameter and deserialize the `evidence` parameter accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the tenant tool will be used to set how to recognize 'whatever is expected to be good as evidence'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more? There would probably need to be a "standard", agreed upon format that each TEE architecture's JSON evidence will need to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user will probably need to be able to put 'the very sensitive secret' (or something related to 'it') into the DB using the keylime tenant tool so that he verifier can then either compare the 'very sensitive secret' produced by the TEE or do some computation (with something relate to 'it') to be able to compare it.

guest memory, certain attacks can take place *before* a guest OS is booted and a
keylime agent has been initialized. Timing of boot attestation is critical,
and thus a TEE boot attestation handler cannot assume that it is communicating
with a keylime agent. Furthermore, the TEE handler will release some secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would significantly change the keylime design and mode of operation. The verifier is never expected to block operations on nodes (e.g., especially one as crucial as the initial boot). Keylime has a "scale-out" architecture where verifiers can be replicated and are considered mostly stateless (for instance, this is how Keylime is deployed in OpenShift/Kubernetes).

provided by keylime *in addition to* TEE technology. That is, it is reasonable
to expect that keylime agents will be deployed on CVMs. With that, it is
reasonable to include TEE attestation mechanisms for environments in which
keylime will already be running for system integrity monitoring purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly agree that Keylime should support CVMs, and actually an initial crude prototype was done @ruocco (never released, but tested internally at IBM), where the registrar was modified to process the "attestation report" for the (then, just SEV) guest, in order to decide if it would accept the EK from that particular agent.

But it is important to avoid conflating the two issues: a) Keylime should incorporate mechanisms to take into account "boot artifacts" from "TEE technology" in order to decide if a node (i.e., CVM) is "attested" and b) how to implement this clearly necessary/welcome/desired functionality. At this point allow me to reiterate the general Keylime philosophy: the attestation is performed after the fact, and it does not step in to prevent a node from getting into an "non-attested" state.

@edwards-n
Copy link

The use case here seems to assume and require SVSM to have a network stack and is also assuming the requirement for the boot device to be encrypted with the key being made available after attestation of the TEE. A simpler use case would be to do attestation of the TEE after boot, and then to pass keys to decrypt data (e.g. held on a separate device). This does not require SVSM to have network stack. It just requires the Keylime agent to ship the TEE attestation along with other evidence. Can we support this use case too?

@tylerfanelli
Copy link
Author

tylerfanelli commented Jul 17, 2024

The use case here seems to assume and require SVSM to have a network stack and is also assuming the requirement for the boot device to be encrypted with the key being made available after attestation of the TEE.

We don't expect SVSM to have a network stack. Rather, we entrust the host to provide a proxy to perform the network communication with the attestation server. We take great care that the proxy is not able to gain any secrets from this.

A simpler use case would be to do attestation of the TEE after boot, and then to pass keys to decrypt data (e.g. held on a separate device). This does not require SVSM to have network stack. It just requires the Keylime agent to ship the TEE attestation along with other evidence. Can we support this use case too?

I'd say it depends on the level of "paranoia" or lack of trust in your host. Latest TEE scenarios assume that a host hypervisor is acting maliciously to tamper with or spy on your CVM, and you really can't trust anything running on your CVM until you've attested it. This is why we've opted to attest within firmware (SVSM) and establish a root-of-trust as early as possible in guest execution.

Assuming a malicious host, there's just no way to attest that anything was tampered with before your agent was brought up. That is, you cannot guarantee that anything from guest firmware-->boot-->userspace was tampered with, and this leaves room for attacks to provide dummy attestation values.

When attesting with SVSM, you can start with the SNP/TDX launch measurement, that covers the CVM platform, config, CPU state, and firmware code, where firmware is the combination of SVSM and OVMF. By validating that, we prove integrity of everything that runs prior to executing code from the guest root disk. Assuming that the OVMF binary is built enabling SecureBoot, and with correct certificates, then we know that OVMF will have validated shim, and shim will have validated the UKI kernel. So by attesting the SNP/TDX launch masurement, we've established trust all the way to the start of guest userspace. Guest userspace is originating from the encrypted image that we built, so we can trust that as well. Also, since we begin with established vTPM state (from attestation), the verifier can validate TEE attestation with its policy.

We're currently working on a new proposal that integrates TEE attestation into the existing registrar process much more tightly. Will CC you when its pushed.

@edwards-n
Copy link

Assuming a malicious host, there's just no way to attest that anything was tampered with before your agent was brought up. That is, you cannot guarantee that anything from guest firmware-->boot-->userspace was tampered with, and this leaves room for attacks to provide dummy attestation values.

I don't see it this way. The whole point of attestation of TEEs and SVSM is to enable a secure chain of trust to be built. As I understand it, coconut-svsm links the TPM EK running in SVSM to the the TEE attestation. The EK is signed by the processor's key. At least that is the way it seems to work in SEV-SNP.

@tylerfanelli
Copy link
Author

You previously mentioned that the result of attestation after boot would be "keys to decrypt data (e.g. held on a separate device)". Can you give an example of such data? In this scenario, we also get keys, but they're the keys to unlock the CVM image itself. Why delay the attestation further and get keys for something else?

@tylerfanelli
Copy link
Author

CC @berrange see @edwards-n's comment above.

@edwards-n
Copy link

You previously mentioned that the result of attestation after boot would be "keys to decrypt data (e.g. held on a separate device)". Can you give an example of such data? In this scenario, we also get keys, but they're the keys to unlock the CVM image itself. Why delay the attestation further and get keys for something else?

I want to measure and verify the stack including the OS. Successfully decrypting something doesn't guarantee its integrity, unless somehow the entire encrypted disk is measured and verified (attested). It would be great if you are also verify the integrity of the OS disk. Do you do that? Therefore I was thinking it is necessary to boot and attest/measure the OS in the normal Keylime way before a key is delivered enabling it to decrypt sensitive data.

I'm not saying the use-case you have is invalid. I would like to get to encrypted OS disks too. I'm just proposing another one. In addition I'm trying to get us to an architectural solution that is simple and modular. Hence my point about having the Keylime agent transport the TEE attestation evidence.

@tylerfanelli tylerfanelli force-pushed the tee-boot-attestation branch from e3968d9 to 51bb7a4 Compare July 24, 2024 04:08
@tylerfanelli
Copy link
Author

Hi all. After working with some of the community and adding suggested changes, I've updated this proposal to better fit the existing keylime process. Please take a look when you have the chance.

not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->
* There are other TEE attestation servers released now, such as the [trustee]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fitzthum I mentioned a few concerns with using Trustee here. Would like to continue the Trustee discussion, PTAL at the process I outlined above and share your thoughts/where you see Trustee fitting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two potential ways that Trustee and Keylime can be combined.

First, since Trustee handles confidential attestation but not runtime attestation and Keylime is the opposite, you can have Trustee do the initial confidential attestation and provision state for a vTPM which will be registered with Keylime. There is some coordination required here, but it is relatively simple. I think this is the approach you are referring to below. To me this is the best way to do confidential attestation. You mentioned that it would require two servers to run, but I don't see that as a problem. In fact to some people having two simpler services handling distinct things is preferable to combining everything. I don't think the number of processes running is what we should be optimizing for. Either way, this approach is almost certainly simpler and easier to implement than adding confidential attestation support to Keylime itself.

If you really want to tack on a blocking confidential attestation protocol to Keylime, then you should at least think about using the verifier crate provided by Trustee. This allows you to verify 8 different types of confidential evidence and avoid rolling your own security-sensitive code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two potential ways that Trustee and Keylime can be combined.

First, since Trustee handles confidential attestation but not runtime attestation and Keylime is the opposite, you can have Trustee do the initial confidential attestation and provision state for a vTPM which will be registered with Keylime. There is some coordination required here, but it is relatively simple. I think this is the approach you are referring to below. To me this is the best way to do confidential attestation. You mentioned that it would require two servers to run, but I don't see that as a problem. In fact to some people having two simpler services handling distinct things is preferable to combining everything.

This is true if they were handling two distinct things, but we merge some of these pieces. For instance, trustee wouldn't have to simply provide initial vTPM state, but also extend a PCR with some value provided by the keylime verifier to indicate a successful attestation when the keylime agent registers itself post-boot. This requires much more coordination between the two servers and can get messy very fast.

I don't think the number of processes running is what we should be optimizing for. Either way, this approach is almost certainly simpler and easier to implement than adding confidential attestation support to Keylime itself.

I disagree. Outside of providing TEE verifiers, this is really only making some slight changes to the registrar to encrypt/decrypt secrets and write extra information to the registrar DB.

If you really want to tack on a blocking confidential attestation protocol to Keylime, then you should at least think about using the verifier crate provided by Trustee. This allows you to verify 8 different types of confidential evidence and avoid rolling your own security-sensitive code.

I'm much more open to this approach as by using the verifier crate in trustee, we can simply provide a web hook to the verifier to attest evidence. As you also mention, we don't have to roll our own TEE verifier code.

For example with SEV-SNP, a webhook could be exposed to the registrar that would take the attestation report and certificates, verifying and giving a pass/fail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true if they were handling two distinct things, but we merge some of these pieces. For instance, trustee wouldn't have to simply provide initial vTPM state, but also extend a PCR with some value provided by the keylime verifier to indicate a successful attestation when the keylime agent registers itself post-boot. This requires much more coordination between the two servers and can get messy very fast.

I think confidential attestation and runtime attestation are pretty distinct. I don't think Trustee needs to worry about the nonce during registration time. It can simply deliver a vTPM with a known identity and then the keylime agent can extend the PCRs as needed (i.e with a nonce). The presence of particular root keys in the vTPM will imply that a hardware attestation has taken place without Keylime having to know anything about the details of the confidential TCB. Keylime can use the vTPM like it would any other, which seems convenient.

@tylerfanelli tylerfanelli force-pushed the tee-boot-attestation branch from 51bb7a4 to 5140a65 Compare July 24, 2024 04:56
@edwards-n
Copy link

As we discussed at the community meeting today. I think this needs a nonce or equivalent to guarantee freshness of the attestation report posted to the registrar.

@THS-on
Copy link
Member

THS-on commented Jul 25, 2024

Notes from yesterdays meeting:

  • The process itself should be agnostic to the CVM solution (i.e. should work with SEV-SNP, TDX and CCA)
  • TEE attestation needs to include a nonce from the Keylime side to reduce replay attacks (as mentioned by @edwards-n)
  • The generated identity should be unique to the platform and VM start
    • Is there some platform unique information that we can use?
  • The same UUID should not be able to register twice
  • Because the vTPM state is created offline its same for each started VM with the same rootfs + state
    • This is bad as then the generated attestation key for runtime attestation might be the same
    • Maybe we can find a way to have one predicable part for key derivation and another that is unique to the VM
  • Adding clear differentiation why this protocol instead of a KBS or other secret management solution
    • Its simpler
    • What use cases does this cover and when it is not enough
  • Define what data is exposed to the host (also concern of side channels for static encrypted data)
    • before VM startup: encrypted rootfs + vTPM state, something else?
    • during startup: everything that goes through the proxy. Can we encrypt that traffic?
  • Architectural considerations
    • TEE attestation should be likely part of the verifier, so that we can re-use it. (Also plays into Exposing Measured Boot and IMA parsing and validation as REST API #105)
    • The decryption and re-encryption part can be a standalone component from the verifier
      • The decryption key should be kept in a HSM by default (use PKCS #11 for this)
    • During agent startup the registrar needs to have a way to check if the pre-boot attestation was successful

@tylerfanelli
Copy link
Author

tylerfanelli commented Jul 25, 2024

As we discussed at the community meeting today. I think this needs a nonce or equivalent to guarantee freshness of the attestation report posted to the registrar.

@ansasaki and I have spoken about this before, and wonder if this is actually necessary.

The REPORT_DATA hash is comprised of agent UUID, CVM Integrity Public Key, and CHIP_ID. Since the agent UUID and CVM Integrity Public Key are generated within SVSM at runtime and hashed into REPORT_DATA, we guarantee that the attestation report was generated within SVSM (and is thus fresh). With this, why is a nonce required?

@edwards-n
Copy link

As we discussed at the community meeting today. I think this needs a nonce or equivalent to guarantee freshness of the attestation report posted to the registrar.

@ansasaki and I have spoken about this before, and wonder if this is actually necessary.

The REPORT_DATA hash is comprised of agent UUID, CVM Integrity Public Key, and CHIP_ID. Since the agent UUID and CVM Integrity Public Key are generated within SVSM at runtime and hashed into REPORT_DATA, we guarantee that the attestation report was generated within SVSM (and is thus fresh). With this, why is a nonce required?

I thought we went through this yesterday. This could be replayed by an attacker. You have to have a challenge from the verifying party.

@berrange
Copy link

The REPORT_DATA hash is comprised of agent UUID, CVM Integrity Public Key, and CHIP_ID. Since the agent UUID and CVM Integrity Public Key are generated within SVSM at runtime and hashed into REPORT_DATA, we guarantee that the attestation report was generated within SVSM (and is thus fresh). With this, why is a nonce required?

I thought we went through this yesterday. This could be replayed by an attacker. You have to have a challenge from the verifying party.

The "CVM Integrity Public Key" is defacto a nonce, since it is randomly generated by SVSM on every boot. If the response to the attestation is encrypted with this public key, then the response will only be decryptable by the original SVSM that created the random "CVM Integrity Public Key". IOW, if a malicious party attempts to relpay an attestation report, while it will validate fine, the attacker wouldn't be able to do anything with the response they get back, as they won't have the private key required to decrypt it.

@edwards-n
Copy link

The "CVM Integrity Public Key" is defacto a nonce, since it is randomly generated by SVSM on every boot. If the response to the attestation is encrypted with this public key, then the response will only be decryptable by the original SVSM that created the random "CVM Integrity Public Key". IOW, if a malicious party attempts to relpay an attestation report, while it will validate fine, the attacker wouldn't be able to do anything with the response they get back, as they won't have the private key required to decrypt

This is weaker than it needs to be. It is relying on the CVM key being kept secret forever, and there may be other attacks. The right way to do it is with a nonce from the verifier. That prevents replay attacks.

@tylerfanelli
Copy link
Author

tylerfanelli commented Jul 25, 2024

The "CVM Integrity Public Key" is defacto a nonce, since it is randomly generated by SVSM on every boot. If the response to the attestation is encrypted with this public key, then the response will only be decryptable by the original SVSM that created the random "CVM Integrity Public Key". IOW, if a malicious party attempts to relpay an attestation report, while it will validate fine, the attacker wouldn't be able to do anything with the response they get back, as they won't have the private key required to decrypt

This is weaker than it needs to be. It is relying on the CVM key being kept secret forever, and there may be other attacks. The right way to do it is with a nonce from the verifier. That prevents replay attacks.

A few examples of possible replay attacks, for all of these we will assume that SVSM has created the UUID and CVM Integrity Key as expected:

1. Host captures all attestation info from guest, tries to use it again.

Once attestation is done once, the UUID is written to registrar DB. If the same UUID is found in DB, then registrar knows that this guest has already attested and cannot attest again. Registrar fails. On top of this, host cannot access CVM Integrity Private Key, so can't read anything from registrar anyway.

2. Host captures all attestation info from guest, changes UUID

The original UUID was hashed in REPORT_DATA. Hash calculation using spoofed UUID fails.

3. Host captures all attestation info from guest, changes UUID and REPORT_DATA hash

The original REPORT_DATA hash was measured in the signature of the chip endorsement key. Signature validation fails.

4. Host captures all attestation info from guest, changes CVM Integrity Key

The original CVM Integrity Key was hashed in REPORT_DATA. Hash calculation using spoofed CVM Integrity Key fails.

Please provide an example of a replay attack that is not covered by the current implementation and that would be covered with the introduction of a nonce.

@berrange
Copy link

This is weaker than it needs to be. It is relying on the CVM key being kept secret forever, and there may be other attacks. The right way to do it is with a nonce from the verifier. That prevents replay attacks.

The CVM key created by SVSM should not live forever, as it ought to be erasable from memory after attestation response is handled by SVSM. In addition, if sensitive data can leak from SVSM, then that's a serious flaw in the CVM architecture, at which point we've a whole pile of serious problems to deal with besides attestation replay.

@tylerfanelli
Copy link
Author

@berrange I've addressed your comments, please re-review when you have the chance.

@tylerfanelli tylerfanelli force-pushed the tee-boot-attestation branch from 01319fa to e4b4833 Compare July 31, 2024 17:53
@stringlytyped
Copy link
Contributor

stringlytyped commented Jul 31, 2024

I agree, that justification given by @klauskiwi provides a lot of helpful context which was missing from the discussion before. I remain concerned that this proposal weakens Keylime security in fundamental ways outside of just the missing nonce and other protocol-level deficiencies:

  • While the KL verifier and registrar are deemed "trusted", they are typically only trusted in so far as to ensure a corrrect verification result. This proposal makes it so that a registrar compromise leads to key compromise1.
  • As @THS-on brought up in the meeting last week, cloning a confidential VM (CVM) would result in the cloned CVM's vTPM receiving the same initial TPM state as the original CVM which means that any post-boot attestations of the confidential VM cannot be trusted. This is because an attacker can cause attestations produced by one "good-state" CVM to be accepted by the verifier in place of attestations produced by a "bad-state" CVM which the attacker has compromised. This breaks Keylime's fundamental security property, which I have previously described in an informal adversarial model.

It may be possible to address the second point by including additional identifying information in the pre-boot attestation (beyond just the chip ID and the "CVM public key" as it is called in the proposal), but this remains uncertain.

There is another—perhaps slightly philosophical—argument against accepting this proposal in its current form: that it violates well-established Keylime architectural principles:

  • Keylime is "reporting only": the position has long been that core Keylime should not contain mechanisms that prevent systems from running2. This is because we do not want a misconfigured policy or unavailability of a Keylime server to trigger a denial-of-service against a user's wider infrastructure. (This is a point that was brought up by @maugustosilva in the meeting and is part of the reason why KL itself should not perform key management tasks.)
  • Strict separation of concerns should be maintained between the Keylime registrar and verifier, with the registrar performing only identity management tasks and the verifier performing verification of attestations. I know the point was made in the meeting that the pre-boot attestation is done for purposes of verifying the identity of the vTPM and not for verifying system state. But my understanding is that this cannot be entirely true as you also need to verify the privilege level of the CVM that the vTPM is running in, amongst other things. (The vTPM should be running at privilege level 0, otherwise one cannot trust that its memory is protected from other CVMs.)

Finally, it is unclear to me whether it will be possible to upstream your proxy, which is required for this proposal to work, into Coconut SVSM. They are currently working on their own solution to the problem of persisting vTPM state.

My position is that while this is a valid use case for Keylime and attestation/verification generally and, in fact, an excellent display of the power unlocked by attestation technologies, this is not something that should be supported and maintained by the Keylime project directly. Instead, I think we should focus on evolving KL to provide a number of well-defined extension points by which something like this can be easily added into a Keylime deployment. Then, this idea of pre-boot attestation combined with a lightweight key broker can be implemented as a community-supported extension to Keylime.

I have put together a draft of a REST-based extension model to show what this may look like. This could be added to Keylime in advance of a full plug-in architecture and I believe it would be fairly simple to do so, although I myself have my hands full with the push model at the moment. The linked document is not fully done (there are some incomplete sections, as you will see) but should give you an idea of what I am picturing.

I am also happy to provide a sketch of how pre-boot attestation could be implemented using this extension framework, if that would help illustrate the concept.

Footnotes

  1. Historically KL has been used to provision keys/credentials, but this is a pattern we are explicitly and intentionally moving away from with the introduction of the push model.

  2. Again, this has not always historically been the case, but the community has learned from the now-deprecated revocations feature.

@tylerfanelli
Copy link
Author

tylerfanelli commented Aug 1, 2024

Hi @stringlytyped, thanks for the comments. One thing I'd like to clear up before answering some of your other points.

Finally, it is unclear to me whether it will be possible to upstream your proxy, which is required for this proposal to work, into Coconut SVSM. They are currently working on their own solution to the problem of persisting vTPM state.

The document you linked is the proxy we've developed, our proxy and the one you linked are the same thing.

@stefano-garzarella
Copy link

Finally, it is unclear to me whether it will be possible to upstream your proxy, which is required for this proposal to work, into Coconut SVSM. They are currently working on their own solution to the problem of persisting vTPM state.

The document you linked is the proxy we've developed, our proxy and the one you linked are the same thing.

Hi @stringlytyped, I'm one of the authors of that document and I confirm that in Coconut SVSM, among other attestation servers, we also want to support Keylime with this proposal. We are in contact with Tyler to stay in sync.

Copy link

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left just a minor comment, the rest LGTM!
Compared with the previous version, the distinction between vTPM state and vTPM state key now seems much clearer to me.

I still need to understand better if we can derive the UUID from the TPM Endorsement Key, but we can discuss this later.


- SVSM will decrypt its vTPM State Key with its private CVM Integrity Key.
- SVSM will decrypt vTPM state with vTPM State Key.
- SVSM will derive key from vTPM state and use to unlock rootfs.
Copy link

@stefano-garzarella stefano-garzarella Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this line. The rootfs will be unlocked from the guest OS. Kernel and initird are not encrypted (but measured and signed if using UKI), so sytemd is already able to unlock the rootfs using a key sealed with the TPM.

@stringlytyped
Copy link
Contributor

The document you linked is the proxy we've developed, our proxy and the one you linked are the same thing.

@tylerfanelli Ah, my bad! That makes sense. Is there a chance you can link that document from the proposal? Ideally, the proposal should itself have a description of the proxy, how it works, and its security properties.

Hi @stringlytyped, I'm one of the authors of that document and I confirm that in Coconut SVSM, among other attestation servers, we also want to support Keylime with this proposal. We are in contact with Tyler to stay in sync.

@stefano-garzarella I'll withdraw that point then; glad you are collaborating closely! Can you help me understand what the Coconut SVSM community's view is on the proxy and its security implications? Allowing outside network connections from Coconut SVSM itself seems a bit antithetical to the security goals of an SVSM to me, but maybe I am missing something.

@stringlytyped
Copy link
Contributor

Also, could you provide a more complete description of the "CVM Integrity Public Key"? I assume you are using this term to be processor agnostic, but I would like to understand whether this corresponds to an SEV-SNP "guest key" or something else.

It would also be good to see a concrete example of what measurements might be included in the pre-boot attestation (the "attestation report" as you call it).

Thanks!

@berrange
Copy link

berrange commented Aug 1, 2024

I agree, that justification given by @klauskiwi provides a lot of helpful context which was missing from the discussion before. I remain concerned that this proposal weakens Keylime security in fundamental ways outside of just the missing nonce and other protocol-level deficiencies:

* While the KL verifier and registrar are deemed "trusted", they are typically only trusted in so far as to ensure a corrrect verification result. This proposal makes it so that a registrar compromise leads to key compromise[1](#user-content-fn-note1-54ae2bc503450c28071a0d17f128f629).

I share this concern. It does make good sense to minimize the scope of each component to a single job. Amongst
other things, it makes it easier to reason about the correctness of each component. Separating key holding from
verification looks like good practice.

A split architecture would imply that SVSM wouldn't be directly talk to keylime. Instead SVSM would need to talk to a key broker, and the key broker would in turn talk to keylime to verify the evidence it had received from SVSM.

* As @THS-on brought up in the meeting last week, cloning a confidential VM (CVM) would result in the cloned CVM's vTPM receiving the same initial TPM state as the original CVM which means that any post-boot attestations of the confidential VM cannot be trusted. This is because an attacker can cause attestations produced by one "good-state" CVM to be accepted by the verifier in place of attestations produced by a "bad-state" CVM which the attacker has compromised. This breaks Keylime's fundamental security property, which I have previously described in an informal [adversarial model](https://github.com/stringlytyped/keylime-push-proposal/blob/main/supporting-materials/adversarial-model.md).

Right, I don't think we can prevent concurrent launch of 2 instances of the same CVM given the current design proposal.

Essentially it requires some form of authorization mechanism with whatever component is releasing the key, to make key release "single shot".

IOW, when a user wants to boot a CVM instance, they need to be able to first talk to the key broker and say "I authorize releasing the key once to a VM identified ". Once the key is released, it must not be released again, even if the attestation report verifies, until the user explicitly grants permission to release it again.

It may be possible to address the second point by including additional identifying information in the pre-boot attestation (beyond just the chip ID and the "CVM public key" as it is called in the proposal), but this remains uncertain.

The problem is that the verifier has no knowledge of whether CVMs shut down. So it can't distinguish between a CVM that is created, then shutdown, then created again, from the situation where a CVM is created, and a copy is created a second time.

NB, this attack does not allow the CVM to be compromised by the malicious person who starts the cloned copy, as that person can't access anything inside the VM without access credentials (ie ssh key). It is more of a denial of service attack from creating multiple instances.

There is another—perhaps slightly philosophical—argument against accepting this proposal in its current form: that it violates well-established Keylime architectural principles:

* Keylime is "reporting only": the position has long been that core Keylime should not contain mechanisms that prevent systems from running[2](#user-content-fn-note2-54ae2bc503450c28071a0d17f128f629). This is because we do not want a misconfigured policy or unavailability of a Keylime server to trigger a denial-of-service against a user's wider infrastructure. (This is a point that was brought up by @maugustosilva in the meeting and is part of the reason why KL itself should not perform key management tasks.)

In a CVM environment doing early-boot attestation, whatever service we use for attestation will inevitably be a failure point in the infrastructure, because it is intentionally to be a used as a gating check and must block all further use of the CVM on failure. This need not be a concern for keylime itself though. Rather is an choice in the way keylime would be consumed in the overall system architecture.

* Strict separation of concerns should be maintained between the Keylime registrar and verifier, with the registrar performing only identity management tasks and the verifier performing verification of attestations. I know the point was made in the meeting that the pre-boot attestation is done for purposes of verifying the identity of the vTPM and not for verifying system state. But my understanding is that this cannot be entirely true as you also need to verify the privilege level of the CVM that the vTPM is running in, amongst other things. (The vTPM should be running at privilege level 0, otherwise one cannot trust that its memory is protected from other CVMs.)

We don't directly verify the vTPM at all. We verify the SVSM binary, against known good SVSM binaries hashes. This proves SVSM is running the code we expect, and we know that trusted SVSM code will be created the vTPM securely.

None the less I do tend to agree that releasing keys conceptually falls outside the scope of both the registrar & verifier, and is really a distinct 3rd component that would use the register& verifier.

@berrange
Copy link

berrange commented Aug 1, 2024

Also, could you provide a more complete description of the "CVM Integrity Public Key"? I assume you are using this term to be processor agnostic, but I would like to understand whether this corresponds to an SEV-SNP "guest key" or something else.

This would be a single use keypair that is freshly generated by SVSM when it starts, used in the attestation operation it initiates, then erased.

@stefano-garzarella
Copy link

Hi @stringlytyped, I'm one of the authors of that document and I confirm that in Coconut SVSM, among other attestation servers, we also want to support Keylime with this proposal. We are in contact with Tyler to stay in sync.

@stefano-garzarella I'll withdraw that point then; glad you are collaborating closely! Can you help me understand what the Coconut SVSM community's view is on the proxy and its security implications? Allowing outside network connections from Coconut SVSM itself seems a bit antithetical to the security goals of an SVSM to me, but maybe I am missing something.

Perhaps the term proxy is a bit confusing. Precisely because we don't want to complicate SVSM and open up possible attacks, we don't have any implementation of TCP/IP stacks, HTTP, etc. in SVSM, but just a simple communication channels (for now a serial port) where to make requests to the host. (The last section of the document you shared, describes this scenario better.)

It's more of a service that the host offers to the guest to request a remote attestation. Once the attestation is done, the channel can also be closed by SVSM.

Obviously, SVSM and the remote attestation server know very well that this service is clearly a man-in-the-middle and therefore the channel is not absolutely secure. And this is where the proposal tries to protect both SVSM and the remote attestation server from this, in detail the server can use the attestation report to make sure it is talking to a genuine SVSM through its measurement. The attestation report also measures the "CVM Integrity Public Key" that SVSM generate at each boot to receive secrets from the server. By attestation report, I am referring to the one generated by the CPU and signed by the HW vendor keys that contains the launch measurement (SVSM, edk2, etc.) plus additional data that SVSM can produce, to be signed by the HW.

At this point, the server knows that the key is generated by a genuine SVSM and uses it to encrypt the secrets.
Of course, the host can impersonate the remote attestation server, but the only attack it can do is a DoS. For example, it could replace the TPM state, encrypt it with its own key, and release that key to SVSM. CVM would still be launched, but at that point, the rootfs would not be accessible to CVM because the rootfs key is sealed with the TPM that only the right attestation server is able to unlock.

@THS-on
Copy link
Member

THS-on commented Aug 1, 2024

Adding to the architecture discussion, I agree with the points raised by @stringlytyped and @berrange.
Here the general architecture idea:
Architecture idea

The new component would be the TEE broker, that can implement the simpler protocol and not a full KBS. Once the proposal @stringlytyped is working on is complete, the only major modification to Keylime is to provide an API to attest the TEE evidence, which should be doable.

If then someone wants to hookup a fully blown KBS, they just need to implement a way for the Registrar to validate the EK and UUID.

@tylerfanelli
Copy link
Author

tylerfanelli commented Aug 1, 2024

@THS-on While I very much like this, I wonder why TEE evidence would be sent to Keylime Verifier. Is this because others would like to take advantage of post-boot attestation with the verifier, so we could just re-use that code?

i.e. why can't the TEE broker do TEE attestation evidence as well?

Nonetheless, I'm going to try and put together a small PoC of the TEE broker. using this diagram. Thanks for providing.

@berrange
Copy link

berrange commented Aug 1, 2024

Finally, it is unclear to me whether it will be possible to upstream your proxy, which is required for this proposal to work, into Coconut SVSM. They are currently working on their own solution to the problem of persisting vTPM state.

The document you linked is the proxy we've developed, our proxy and the one you linked are the same thing.

Hi @stringlytyped, I'm one of the authors of that document and I confirm that in Coconut SVSM, among other attestation servers, we also want to support Keylime with this proposal. We are in contact with Tyler to stay in sync.

And for historical context the entire early boot attestation concept being discussed here is something I roughly outlined the idea for a long while ago (https://lore.kernel.org/linux-coco/[email protected]/), and have since been assisting both @stefano-garzarella & @tylerfanelli as they work through the details of a possible implementation.

@THS-on
Copy link
Member

THS-on commented Aug 1, 2024

@tylerfanelli

@THS-on While I very much like this, I wonder why TEE evidence would be sent to Keylime Verifier. Is this because others would like to take advantage of post-boot attestation with the verifier, so we could just re-use that code?

Yes, exactly. This would allow us also to support full post start attestation and then do the identity management with something like SPIFFE/SPIRE.

i.e. why can't the TEE broker do TEE attestation evidence as well?

Technically it can, but its about separation of concerns.

Nonetheless, I'm going to try and put together a small PoC of the TEE broker. using this diagram. Thanks for providing.

Awesome!

Another thing that we need to think about how policies and reference values (though Keylime does not differentiate between those) are provisioned. Runtime attestation is currently done by the tenant tool. The question is how we handle it for this type of attestation.

@tylerfanelli
Copy link
Author

tylerfanelli commented Aug 2, 2024

@THS-on

There's two core components to TEE attestation. I'll discuss both and show how we can provision reference values.

  1. Signature verification of TEE attestation reports

TEE attestation reports are signed by the secure processor. For instance in SEV-SNP, the chip endorsement key signs attestation report. There are three separate certificates (VCEK, ASK, and ARK) that trace back to AMD's root of trust. All of these can be provided to the guest from the host hypervisor, so SVSM can deliver them to the TEE broker along with the attestation report.

The TEE broker could verify the root key (ARK) is actually from AMD's root of trust, and can then trace the signing chain down to the attestation report. No reference values or policy required.

  1. Launch measurement validation

TEE attestation reports contain a launch measurement. For a normal SVSM scenario, this measurement would contain a hash of the SVSM/OVMF binaries that are running inside the confidential enclave. The only thing we need to validate is that the measurement provided is a known trusted SVSM/OVMF hash.

Essentially, we'll need to provide a list of known trusted SVSM/OVMF hashes that we'll allow, and ensure that the actual measurement in the report matches one of the these "reference" measurements.

Validating both of these verifies two things:

  1. Your report came from a valid TEE secure processor, and thus your system is running in a confidential enclave.

  2. The software you intended to load (in our case the SVSM and OVMF firmware) is what is actually running on the system.

Besides the reference SVSM/OVMF hashes that you'll allow, there's really not that much reference values to keep track of, and I don't see why the tenant tool can't provision these. You can offer much more complex policies to examine attestation reports, but to me, this is enough to establish the root of trust and boot the VM, and is simple. If you don't agree that it's enough, please let it be known.

@edwards-n
Copy link

@tylerfanelli

@THS-on While I very much like this, I wonder why TEE evidence would be sent to Keylime Verifier. Is this because others would like to take advantage of post-boot attestation with the verifier, so we could just re-use that code?

Yes, exactly. This would allow us also to support full post start attestation and then do the identity management with something like SPIFFE/SPIRE.

Sending the TEE evidence to the Keylime Verifier will support the use case I have in mind. I think it will definitely help re-use.

@THS-on
Copy link
Member

THS-on commented Aug 2, 2024

The TEE broker could verify the root key (ARK) is actually from AMD's root of trust, and can then trace the signing chain down to the attestation report. No reference values or policy required.

There is still a reference value (or when speaking in RATS terminology endorsement) which is the cert chain. Besides checking the authenticity of the report you likely want to also check other things, such as TCB level, is this a CPU that I know/own etc. Just establishing that the report is signed by an ARK is likely not enough.
Also as mentioned in the meeting lets try not to build something SEV-SNP specific and keep other CVMs in mind.

Essentially, we'll need to provide a list of known trusted SVSM/OVMF hashes that we'll allow, and ensure that the actual measurement in the report matches one of the these "reference" measurements.

In the simplest case yes.

Besides the reference SVSM/OVMF hashes that you'll allow, there's really not that much reference values to keep track of, and I don't see why the tenant tool can't provision these. You can offer much more complex policies to examine attestation reports, but to me, this is enough to establish the root of trust and boot the VM, and is simple. If you don't agree that it's enough, please let it be known.

The question is not if it can, but how the user story looks like, as this is not simple. We are essentially linking pre-boot, boot and runtime attestation together. On what level do we want to do that (instance, image, global)? For boot and runtime attestation we currently do it per instance (with the option to share policies between instances).

@tylerfanelli
Copy link
Author

Hi all, I've submitted a new version of the proposal. I've discussed some of the ideas outlined in the proposal during the last upstream Keylime meeting. If there's any questions, please feel free to reach out.

I hope to show a demo of all of these changes with SVSM during an upcoming upstream meeting.

@edwards-n
Copy link

Thank you for the updated proposal and for including the notion of a nonce requested from the Keylime TEE Broker. I think this is a significant improvement on the earlier version.

@IT302
Copy link

IT302 commented Sep 3, 2024

The proposal still seems to be closely tied to AMD SNP, where an SVSM runs in VMPL0. How would this scheme be used with Intel TDX, or is that not a goal? As far as I know, TDX does not have VMPLs.

@berrange
Copy link

berrange commented Sep 3, 2024

The proposal still seems to be closely tied to AMD SNP, where an SVSM runs in VMPL0. How would this scheme be used with Intel TDX, or is that not a goal? As far as I know, TDX does not have VMPLs.

The current SVSM paravisor only supports AMD SNP via VMPLs today. There are some folk from Intel who are working on TDX support in SVSM that uses the TDX Partitioning feature to separate the paravisor from the end user VM. So we hope this proposed keylime enhancement will eventually work with TDX too.

107-tee-boot-attestation.md Outdated Show resolved Hide resolved
107-tee-boot-attestation.md Outdated Show resolved Hide resolved
@tylerfanelli
Copy link
Author

The proposal still seems to be closely tied to AMD SNP, where an SVSM runs in VMPL0. How would this scheme be used with Intel TDX, or is that not a goal? As far as I know, TDX does not have VMPLs.

The current SVSM paravisor only supports AMD SNP via VMPLs today. There are some folk from Intel who are working on TDX support in SVSM that uses the TDX Partitioning feature to separate the paravisor from the end user VM. So we hope this proposed keylime enhancement will eventually work with TDX too.

Yes, we're also hoping to support TDX (and CCA, for that matter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.