Skip to content

feat(gpu): add support for GPU-accelerated expand on the HL Api #2276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pdroalves
Copy link
Contributor

@pdroalves pdroalves commented Apr 16, 2025

closes: https://github.com/zama-ai/tfhe-rs-internal/issues/986

PR content/description

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

This change is Reviewable

@cla-bot cla-bot bot added the cla-signed label Apr 16, 2025
@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch 3 times, most recently from 130b2b4 to 8f76233 Compare April 17, 2025 12:49
@pdroalves pdroalves marked this pull request as ready for review April 17, 2025 12:49
@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch 6 times, most recently from 86412ec to 3764ea3 Compare April 17, 2025 14:17
@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch 5 times, most recently from 5a83bdc to 161093c Compare April 17, 2025 15:45
Copy link
Contributor

@agnesLeroy agnesLeroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pdroalves! Amazing to get expand on GPU in the HL API! 🎉 I have some questions about the PR, and also I'm thinking it would be good to have some documentation similar to tfhe/docs/fhe-computation/advanced-features/zk-pok.md showing how to run expand on GPU, in a new file maybe tfhe/docs/configuration/gpu_acceleration/zk-pok.md, where we would explain the ZK proof and verif can only be executed on CPU, linking to the existing zk-pok.md file, and adding a part about expand on GPU, wdyt?

Reviewed 8 of 15 files at r1, all commit messages.
Reviewable status: 8 of 15 files reviewed, 8 unresolved discussions (waiting on @IceTDrinker)


tfhe/src/integer/gpu/key_switching_key/mod.rs line 11 at r1 (raw file):

#[derive(Clone)]
#[allow(dead_code, clippy::struct_field_names)]

Do we really need allow(dead_code)?


tfhe/src/integer/gpu/key_switching_key/mod.rs line 17 at r1 (raw file):

}

#[allow(dead_code, clippy::struct_field_names)]

Do we really need allow(dead_code)?


tfhe/src/high_level_api/compact_list.rs line 4 at r1 (raw file):

use crate::backward_compatibility::compact_list::CompactCiphertextListVersions;
#[cfg(feature = "zk-pok")]

Is it normal this import is removed?


tfhe/src/high_level_api/compact_list.rs line 28 at r1 (raw file):

use crate::{CompactPublicKey, Tag};

#[cfg(feature = "strings")]

Same question here


tfhe/src/high_level_api/compact_list.rs line 235 at r1 (raw file):

#[cfg(feature = "zk-pok")]
pub mod zk {

Why does the module become pub?


tfhe/src/high_level_api/compact_list.rs line 252 at r1 (raw file):

    }

    #[derive(Serialize, Deserialize, Versionize)]

Here it doesn't derive Clone anymore?


tfhe/src/high_level_api/compact_list.rs line 255 at r1 (raw file):

    #[versionize(ProvenCompactCiphertextListVersions)]
    pub struct ProvenCompactCiphertextList {
        pub(in crate::high_level_api) inner: InnerProvenCompactCiphertextList,

Why this pub(in crate::high_level_api)?


tfhe/src/high_level_api/keys/inner.rs line 315 at r1 (raw file):

#[cfg(feature = "gpu")]
#[allow(dead_code)]

Do we really need to allow dead_code? Isn't there something missing in the prelude instead? 🤔

@IceTDrinker
Copy link
Member

note that clippy::struct_field_names is not necessary anymore, the lint was a potential problem for backward compatibility (as the lint could randomly start complaining and a developer could change the name, breaking the compatibility on some structs

Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't read everything but it seems to me there is something that's not right in the way part of the code is currently designed, an expander is not expected to be serialized/versioned, it's the object that comes after an expansion and allows to retrieve data from it, the compact list it is derived from can be serialized, the resulting FheUint etc. from the expander as well, but the expander is only there to make the conversion

Reviewed 8 of 15 files at r1, all commit messages.
Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @pdroalves)


tfhe/src/high_level_api/compact_list.rs line 4 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Is it normal this import is removed?

No, that does not look reasonable


tfhe/src/high_level_api/compact_list.rs line 28 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Same question here

this one was moved a few lines above it seems


tfhe/src/high_level_api/compact_list.rs line 185 at r1 (raw file):

    }

    pub fn expand(&self) -> crate::Result<CompactCiphertextListExpander> {

The way the function is written means expansion is always done on CPU in that case ?


tfhe/src/integer/backward_compatibility/ciphertext/mod.rs line 120 at r1 (raw file):

#[derive(VersionsDispatch)]
pub enum CompactCiphertextListExpanderVersions {

I'm confused, the expander is never serialized right ?


tfhe/src/integer/ciphertext/compact_list.rs line 309 at r1 (raw file):

#[derive(Versionize)]
#[versionize(CompactCiphertextListExpanderVersions)]

the expander is not supposed to be serialized right ? so then why does it need versioning ? 🤔


tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs line 13 at r1 (raw file):

};

#[allow(dead_code)]

as Agnès is mentioning dead code stuff, given this one is public, is it necessary ?

@agnesLeroy
Copy link
Contributor

For the clippy::struct_field_names remark you'll need to rebase on main

Copy link
Contributor Author

@pdroalves pdroalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding struct_field_names, I just rebased and it is gone.

@agnesLeroy Yes, I think it's a good idea. The code to be shown is probably not needed, but the explanation about verify on CPU and expand on GPU is important. I will write that.

@IceTDrinker Yeah, actually the expander was not being serialized, those were only artifacts from a confusing moment during development. I just removed it.

Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @IceTDrinker)


tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs line 13 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

as Agnès is mentioning dead code stuff, given this one is public, is it necessary ?

I've added this to fix a warning about ciphertext_modulus:

warning: field `ciphertext_modulus` is never read
  --> tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs:20:5
   |
14 | pub struct CudaLweKeyswitchKey<T: UnsignedInteger> {
   |            ------------------- field in this struct
...
20 |     ciphertext_modulus: CiphertextModulus<T>,
   |     ^^^^^^^^^^^^^^^^^^
   |
   = note: `CudaLweKeyswitchKey` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

Do you have any suggestions for handling this differently?


tfhe/src/high_level_api/compact_list.rs line 4 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

No, that does not look reasonable

They were not removed, only moved a few lines below. ProvenCompactCiphertextListVersions in particular is not inside mod zk.


tfhe/src/high_level_api/compact_list.rs line 28 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

this one was moved a few lines above it seems

Yes, it was also moved.


tfhe/src/high_level_api/compact_list.rs line 185 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

The way the function is written means expansion is always done on CPU in that case ?

Yes, but it doesn't need to. The integer API doesn't have support to expand CompactCiphertextList, only ProvenCompactCiphertextList and so the HLAPI. I guess I could just generalize that and have the same method for the former. Wdyt?


tfhe/src/high_level_api/compact_list.rs line 235 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Why does the module become pub?

Shouldn't. I probably modified it temporarily for something during development and forgot to remove. Removed!


tfhe/src/high_level_api/compact_list.rs line 252 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Here it doesn't derive Clone anymore?

I had to remove it when I refactored ProvenCompactCiphertextList and introduced InnerProvenCompactCiphertextList. I just added again and implemented Clone for InnerProvenCompactCiphertextList.


tfhe/src/high_level_api/compact_list.rs line 255 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Why this pub(in crate::high_level_api)?

I probably copied this from somewhere else. Not needed. I changed it back to pub(crate).


tfhe/src/high_level_api/keys/inner.rs line 315 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Do we really need to allow dead_code? Isn't there something missing in the prelude instead? 🤔

If I remove this Cargo will complain about cpk_key_switching_key_material not being used. I guess it's because it is only used when the gpu + zk-pok features are activated.

error: field `cpk_key_switching_key_material` is never read
   --> tfhe/src/high_level_api/keys/inner.rs:317:16
    |
315 | pub struct IntegerCudaServerKey {
    |            -------------------- field in this struct
316 |     pub(crate) key: crate::integer::gpu::CudaServerKey,
317 |     pub(crate) cpk_key_switching_key_material:
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


tfhe/src/integer/backward_compatibility/ciphertext/mod.rs line 120 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

I'm confused, the expander is never serialized right ?

Correct. At some point I was mixing the expander with the proven compact list inside my mind and making a confusion. I removed the versioning for somewhere else in the code but forgot to remove this one.


tfhe/src/integer/ciphertext/compact_list.rs line 309 at r1 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

the expander is not supposed to be serialized right ? so then why does it need versioning ? 🤔

Never. Removed.


tfhe/src/integer/gpu/key_switching_key/mod.rs line 11 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Do we really need allow(dead_code)?

I guess so. Without it I get

error: fields `key_switching_key` and `destination_key` are never read
  --> tfhe/src/integer/gpu/key_switching_key/mod.rs:13:16
   |
12 | pub struct CudaKeySwitchingKeyMaterial {
   |            --------------------------- fields in this struct
13 |     pub(crate) key_switching_key: CudaLweKeyswitchKey<u64>,
   |                ^^^^^^^^^^^^^^^^^
14 |     pub(crate) destination_key: EncryptionKeyChoice,
   |                ^^^^^^^^^^^^^^^
   |
   = note: `CudaKeySwitchingKeyMaterial` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
   = note: `-D dead-code` implied by `-D warnings`
   = help: to override `-D warnings` add `#[allow(dead_code)]`

error: fields `key_switching_key_material` and `dest_server_key` are never read
  --> tfhe/src/integer/gpu/key_switching_key/mod.rs:19:16
   |
18 | pub struct CudaKeySwitchingKey<'keys> {
   |            ------------------- fields in this struct
19 |     pub(crate) key_switching_key_material: CudaKeySwitchingKeyMaterial,
   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^
20 |     pub(crate) dest_server_key: &'keys CudaServerKey,

in pcc_gpu. Again, I suppose this happens because CudaKeySwitchingKey is only used in the HLAPI behind gpu + zkpok. Is there any other way to handle this?


tfhe/src/integer/gpu/key_switching_key/mod.rs line 17 at r1 (raw file):

Previously, agnesLeroy (Agnès Leroy) wrote…

Do we really need allow(dead_code)?

Answered above.

@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch from 161093c to 7d827d5 Compare April 22, 2025 12:52
Copy link
Contributor Author

@pdroalves pdroalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agnesLeroy I just pushed a first text version. Wdyt?

Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @IceTDrinker)

Copy link
Contributor

@agnesLeroy agnesLeroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 16 files reviewed, 15 unresolved discussions (waiting on @IceTDrinker and @pdroalves)


tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs line 13 at r1 (raw file):

Previously, pdroalves (Pedro Alves) wrote…

I've added this to fix a warning about ciphertext_modulus:

warning: field `ciphertext_modulus` is never read
  --> tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs:20:5
   |
14 | pub struct CudaLweKeyswitchKey<T: UnsignedInteger> {
   |            ------------------- field in this struct
...
20 |     ciphertext_modulus: CiphertextModulus<T>,
   |     ^^^^^^^^^^^^^^^^^^
   |
   = note: `CudaLweKeyswitchKey` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

Do you have any suggestions for handling this differently?

Maybe something is missing in the prelude?


tfhe/src/high_level_api/keys/inner.rs line 315 at r1 (raw file):

Previously, pdroalves (Pedro Alves) wrote…

If I remove this Cargo will complain about cpk_key_switching_key_material not being used. I guess it's because it is only used when the gpu + zk-pok features are activated.

error: field `cpk_key_switching_key_material` is never read
   --> tfhe/src/high_level_api/keys/inner.rs:317:16
    |
315 | pub struct IntegerCudaServerKey {
    |            -------------------- field in this struct
316 |     pub(crate) key: crate::integer::gpu::CudaServerKey,
317 |     pub(crate) cpk_key_switching_key_material:
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@IceTDrinker how should we deal with this, should we gate the cpk_key_switching_key_material with the zk-pok feature?


tfhe/docs/configuration/gpu_acceleration/zk-pok.md line 3 at r2 (raw file):

# Zero-knowledge proofs

This document explains how to speed up the verification of zero-knowledge proofs using the GPU, similar to how it's done on the [CPU](../../fhe-computation/advanced-features/zk-pok.md).

This is a bit confusing, as it leads the reader to think verification can be executed on the CPU. How about this instead?

Zero-knowledge (ZK) proof and verification can only be executed on CPU. However, the preprocessing needed to go from ciphertexts formatted for ZK to ciphertexts in the right format for computation purposes can be accelerated on GPU. ZK verification can be executed on the CPU concurrently with this preprocessing, also referred to as expansion, making the overall workflow faster.


tfhe/docs/configuration/gpu_acceleration/zk-pok.md line 14 at r3 (raw file):

{% endhint %}

Moreover, the GPU backend better performs when Multi-bit PBS parameters are used, as 

I would remove this sentence, as it is not specific to this page.


tfhe/docs/configuration/gpu_acceleration/zk-pok.md line 19 at r3 (raw file):

## Example

The following example shows how a client can encrypt and prove a ciphertext, and how a server can verify and compute the ciphertext on the GPU:

"how a server can verify and compute the ciphertext on the GPU:" -> "how a server can verify the proof, preprocess the ciphertext and run a computation on it on GPU:"

@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch from f82e3c0 to 55edcfd Compare April 23, 2025 18:08
@zama-bot zama-bot removed the approved label May 13, 2025
Copy link
Contributor Author

@pdroalves pdroalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 19 of 20 files reviewed, 2 unresolved discussions (waiting on @agnesLeroy, @IceTDrinker, and @tmontaigu)


tfhe/src/high_level_api/keys/server.rs line 295 at r12 (raw file):

Previously, tmontaigu (tmontaigu) wrote…

The type should not be needed,

Indeed! Thanks!

@pdroalves pdroalves requested review from tmontaigu and agnesLeroy May 13, 2025 11:28
@pdroalves
Copy link
Contributor Author

@agnesLeroy I just rebased this branch. Hope we can merge it soon.

@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch from 4c15de8 to 15eb90a Compare May 19, 2025 08:32
@zama-bot zama-bot removed the approved label May 19, 2025
@IceTDrinker
Copy link
Member

looks like there are some broken imports

@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch from 15eb90a to cf38e44 Compare May 19, 2025 11:40
@pdroalves pdroalves force-pushed the pa/feat/expand_hlapi branch from cf38e44 to 0688a57 Compare May 19, 2025 15:39
@zama-bot zama-bot removed the approved label May 19, 2025
@pdroalves
Copy link
Contributor Author

Looks like I forgot to fetch before rebasing this morning. Now it is fixed.

Copy link
Contributor Author

@pdroalves pdroalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 15 of 20 files reviewed, all discussions resolved (waiting on @IceTDrinker and @tmontaigu)

Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any more reserves on this, to you @tmontaigu

Reviewed 3 of 6 files at r12, 2 of 4 files at r13, 5 of 5 files at r15, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tmontaigu)


tfhe/src/high_level_api/compact_list.rs line 345 at r9 (raw file):

Previously, pdroalves (Pedro Alves) wrote…

Great! Thanks a lot. I just replaced it with your code.

Will leave @tmontaigu to check he is satisfied


tfhe/src/integer/gpu/zk/mod.rs line 462 at r9 (raw file):

Previously, pdroalves (Pedro Alves) wrote…

@IceTDrinker We discussed your suggestion this morning.

I think a decompress_from_cpu might be a lot of work. The only way I've found to avoid that ksk copy inside verify_and_expand was by adding a ksk material to CudaServerKey and then to use it to initialize the final ksk with it when needed, which does not imply on a copy. I'm afraid a single entry point like that, which receives a CPU's ksk and returns the final cuda ksk, may end up on Cargo complaining that I'm trying to return a pointer to a local variable (which would be the material ksk created inside of that method) or to force me clone the ksks once again.

I would also prefer a single entry point for that, but probably that's too much work for this (already too big) PR. We may work on that on a following PR if you wish.

no problem

@IceTDrinker
Copy link
Member

And thanks a lot Pedro, big PR lots of code!

- includes documentation about GPU's accelerated expand on the HL API
- rework CudaKeySwitchingKey
- Cloning the key is no longer necessary on the HL API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants