-
Notifications
You must be signed in to change notification settings - Fork 178
feat(gpu): add support for GPU-accelerated expand on the HL Api #2276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
130b2b4
to
8f76233
Compare
86412ec
to
3764ea3
Compare
5a83bdc
to
161093c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pdroalves! Amazing to get expand on GPU in the HL API! 🎉 I have some questions about the PR, and also I'm thinking it would be good to have some documentation similar to tfhe/docs/fhe-computation/advanced-features/zk-pok.md showing how to run expand on GPU, in a new file maybe tfhe/docs/configuration/gpu_acceleration/zk-pok.md, where we would explain the ZK proof and verif can only be executed on CPU, linking to the existing zk-pok.md file, and adding a part about expand on GPU, wdyt?
Reviewed 8 of 15 files at r1, all commit messages.
Reviewable status: 8 of 15 files reviewed, 8 unresolved discussions (waiting on @IceTDrinker)
tfhe/src/integer/gpu/key_switching_key/mod.rs
line 11 at r1 (raw file):
#[derive(Clone)] #[allow(dead_code, clippy::struct_field_names)]
Do we really need allow(dead_code)?
tfhe/src/integer/gpu/key_switching_key/mod.rs
line 17 at r1 (raw file):
} #[allow(dead_code, clippy::struct_field_names)]
Do we really need allow(dead_code)?
tfhe/src/high_level_api/compact_list.rs
line 4 at r1 (raw file):
use crate::backward_compatibility::compact_list::CompactCiphertextListVersions; #[cfg(feature = "zk-pok")]
Is it normal this import is removed?
tfhe/src/high_level_api/compact_list.rs
line 28 at r1 (raw file):
use crate::{CompactPublicKey, Tag}; #[cfg(feature = "strings")]
Same question here
tfhe/src/high_level_api/compact_list.rs
line 235 at r1 (raw file):
#[cfg(feature = "zk-pok")] pub mod zk {
Why does the module become pub?
tfhe/src/high_level_api/compact_list.rs
line 252 at r1 (raw file):
} #[derive(Serialize, Deserialize, Versionize)]
Here it doesn't derive Clone anymore?
tfhe/src/high_level_api/compact_list.rs
line 255 at r1 (raw file):
#[versionize(ProvenCompactCiphertextListVersions)] pub struct ProvenCompactCiphertextList { pub(in crate::high_level_api) inner: InnerProvenCompactCiphertextList,
Why this pub(in crate::high_level_api)?
tfhe/src/high_level_api/keys/inner.rs
line 315 at r1 (raw file):
#[cfg(feature = "gpu")] #[allow(dead_code)]
Do we really need to allow dead_code? Isn't there something missing in the prelude instead? 🤔
note that clippy::struct_field_names is not necessary anymore, the lint was a potential problem for backward compatibility (as the lint could randomly start complaining and a developer could change the name, breaking the compatibility on some structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read everything but it seems to me there is something that's not right in the way part of the code is currently designed, an expander is not expected to be serialized/versioned, it's the object that comes after an expansion and allows to retrieve data from it, the compact list it is derived from can be serialized, the resulting FheUint etc. from the expander as well, but the expander is only there to make the conversion
Reviewed 8 of 15 files at r1, all commit messages.
Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @pdroalves)
tfhe/src/high_level_api/compact_list.rs
line 4 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Is it normal this import is removed?
No, that does not look reasonable
tfhe/src/high_level_api/compact_list.rs
line 28 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Same question here
this one was moved a few lines above it seems
tfhe/src/high_level_api/compact_list.rs
line 185 at r1 (raw file):
} pub fn expand(&self) -> crate::Result<CompactCiphertextListExpander> {
The way the function is written means expansion is always done on CPU in that case ?
tfhe/src/integer/backward_compatibility/ciphertext/mod.rs
line 120 at r1 (raw file):
#[derive(VersionsDispatch)] pub enum CompactCiphertextListExpanderVersions {
I'm confused, the expander is never serialized right ?
tfhe/src/integer/ciphertext/compact_list.rs
line 309 at r1 (raw file):
#[derive(Versionize)] #[versionize(CompactCiphertextListExpanderVersions)]
the expander is not supposed to be serialized right ? so then why does it need versioning ? 🤔
tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs
line 13 at r1 (raw file):
}; #[allow(dead_code)]
as Agnès is mentioning dead code stuff, given this one is public, is it necessary ?
For the clippy::struct_field_names remark you'll need to rebase on main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding struct_field_names, I just rebased and it is gone.
@agnesLeroy Yes, I think it's a good idea. The code to be shown is probably not needed, but the explanation about verify on CPU and expand on GPU is important. I will write that.
@IceTDrinker Yeah, actually the expander was not being serialized, those were only artifacts from a confusing moment during development. I just removed it.
Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @IceTDrinker)
tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs
line 13 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
as Agnès is mentioning dead code stuff, given this one is public, is it necessary ?
I've added this to fix a warning about ciphertext_modulus
:
warning: field `ciphertext_modulus` is never read
--> tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs:20:5
|
14 | pub struct CudaLweKeyswitchKey<T: UnsignedInteger> {
| ------------------- field in this struct
...
20 | ciphertext_modulus: CiphertextModulus<T>,
| ^^^^^^^^^^^^^^^^^^
|
= note: `CudaLweKeyswitchKey` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
= note: `#[warn(dead_code)]` on by default
Do you have any suggestions for handling this differently?
tfhe/src/high_level_api/compact_list.rs
line 4 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
No, that does not look reasonable
They were not removed, only moved a few lines below. ProvenCompactCiphertextListVersions
in particular is not inside mod zk
.
tfhe/src/high_level_api/compact_list.rs
line 28 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
this one was moved a few lines above it seems
Yes, it was also moved.
tfhe/src/high_level_api/compact_list.rs
line 185 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
The way the function is written means expansion is always done on CPU in that case ?
Yes, but it doesn't need to. The integer API doesn't have support to expand CompactCiphertextList
, only ProvenCompactCiphertextList
and so the HLAPI. I guess I could just generalize that and have the same method for the former. Wdyt?
tfhe/src/high_level_api/compact_list.rs
line 235 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Why does the module become pub?
Shouldn't. I probably modified it temporarily for something during development and forgot to remove. Removed!
tfhe/src/high_level_api/compact_list.rs
line 252 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Here it doesn't derive Clone anymore?
I had to remove it when I refactored ProvenCompactCiphertextList
and introduced InnerProvenCompactCiphertextList
. I just added again and implemented Clone for InnerProvenCompactCiphertextList
.
tfhe/src/high_level_api/compact_list.rs
line 255 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Why this pub(in crate::high_level_api)?
I probably copied this from somewhere else. Not needed. I changed it back to pub(crate)
.
tfhe/src/high_level_api/keys/inner.rs
line 315 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Do we really need to allow dead_code? Isn't there something missing in the prelude instead? 🤔
If I remove this Cargo will complain about cpk_key_switching_key_material
not being used. I guess it's because it is only used when the gpu + zk-pok features are activated.
error: field `cpk_key_switching_key_material` is never read
--> tfhe/src/high_level_api/keys/inner.rs:317:16
|
315 | pub struct IntegerCudaServerKey {
| -------------------- field in this struct
316 | pub(crate) key: crate::integer::gpu::CudaServerKey,
317 | pub(crate) cpk_key_switching_key_material:
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tfhe/src/integer/backward_compatibility/ciphertext/mod.rs
line 120 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
I'm confused, the expander is never serialized right ?
Correct. At some point I was mixing the expander with the proven compact list inside my mind and making a confusion. I removed the versioning for somewhere else in the code but forgot to remove this one.
tfhe/src/integer/ciphertext/compact_list.rs
line 309 at r1 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
the expander is not supposed to be serialized right ? so then why does it need versioning ? 🤔
Never. Removed.
tfhe/src/integer/gpu/key_switching_key/mod.rs
line 11 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Do we really need allow(dead_code)?
I guess so. Without it I get
error: fields `key_switching_key` and `destination_key` are never read
--> tfhe/src/integer/gpu/key_switching_key/mod.rs:13:16
|
12 | pub struct CudaKeySwitchingKeyMaterial {
| --------------------------- fields in this struct
13 | pub(crate) key_switching_key: CudaLweKeyswitchKey<u64>,
| ^^^^^^^^^^^^^^^^^
14 | pub(crate) destination_key: EncryptionKeyChoice,
| ^^^^^^^^^^^^^^^
|
= note: `CudaKeySwitchingKeyMaterial` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
= note: `-D dead-code` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(dead_code)]`
error: fields `key_switching_key_material` and `dest_server_key` are never read
--> tfhe/src/integer/gpu/key_switching_key/mod.rs:19:16
|
18 | pub struct CudaKeySwitchingKey<'keys> {
| ------------------- fields in this struct
19 | pub(crate) key_switching_key_material: CudaKeySwitchingKeyMaterial,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
20 | pub(crate) dest_server_key: &'keys CudaServerKey,
in pcc_gpu. Again, I suppose this happens because CudaKeySwitchingKey is only used in the HLAPI behind gpu + zkpok. Is there any other way to handle this?
tfhe/src/integer/gpu/key_switching_key/mod.rs
line 17 at r1 (raw file):
Previously, agnesLeroy (Agnès Leroy) wrote…
Do we really need allow(dead_code)?
Answered above.
161093c
to
7d827d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agnesLeroy I just pushed a first text version. Wdyt?
Reviewable status: 10 of 15 files reviewed, 12 unresolved discussions (waiting on @agnesLeroy and @IceTDrinker)
b1283fb
to
f82e3c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 16 files reviewed, 15 unresolved discussions (waiting on @IceTDrinker and @pdroalves)
tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs
line 13 at r1 (raw file):
Previously, pdroalves (Pedro Alves) wrote…
I've added this to fix a warning about
ciphertext_modulus
:warning: field `ciphertext_modulus` is never read --> tfhe/src/core_crypto/gpu/entities/lwe_keyswitch_key.rs:20:5 | 14 | pub struct CudaLweKeyswitchKey<T: UnsignedInteger> { | ------------------- field in this struct ... 20 | ciphertext_modulus: CiphertextModulus<T>, | ^^^^^^^^^^^^^^^^^^ | = note: `CudaLweKeyswitchKey` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis = note: `#[warn(dead_code)]` on by default
Do you have any suggestions for handling this differently?
Maybe something is missing in the prelude?
tfhe/src/high_level_api/keys/inner.rs
line 315 at r1 (raw file):
Previously, pdroalves (Pedro Alves) wrote…
If I remove this Cargo will complain about
cpk_key_switching_key_material
not being used. I guess it's because it is only used when the gpu + zk-pok features are activated.error: field `cpk_key_switching_key_material` is never read --> tfhe/src/high_level_api/keys/inner.rs:317:16 | 315 | pub struct IntegerCudaServerKey { | -------------------- field in this struct 316 | pub(crate) key: crate::integer::gpu::CudaServerKey, 317 | pub(crate) cpk_key_switching_key_material: | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@IceTDrinker how should we deal with this, should we gate the cpk_key_switching_key_material with the zk-pok feature?
tfhe/docs/configuration/gpu_acceleration/zk-pok.md
line 3 at r2 (raw file):
# Zero-knowledge proofs This document explains how to speed up the verification of zero-knowledge proofs using the GPU, similar to how it's done on the [CPU](../../fhe-computation/advanced-features/zk-pok.md).
This is a bit confusing, as it leads the reader to think verification can be executed on the CPU. How about this instead?
Zero-knowledge (ZK) proof and verification can only be executed on CPU. However, the preprocessing needed to go from ciphertexts formatted for ZK to ciphertexts in the right format for computation purposes can be accelerated on GPU. ZK verification can be executed on the CPU concurrently with this preprocessing, also referred to as expansion, making the overall workflow faster.
tfhe/docs/configuration/gpu_acceleration/zk-pok.md
line 14 at r3 (raw file):
{% endhint %} Moreover, the GPU backend better performs when Multi-bit PBS parameters are used, as
I would remove this sentence, as it is not specific to this page.
tfhe/docs/configuration/gpu_acceleration/zk-pok.md
line 19 at r3 (raw file):
## Example The following example shows how a client can encrypt and prove a ciphertext, and how a server can verify and compute the ciphertext on the GPU:
"how a server can verify and compute the ciphertext on the GPU:" -> "how a server can verify the proof, preprocess the ciphertext and run a computation on it on GPU:"
f82e3c0
to
55edcfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 19 of 20 files reviewed, 2 unresolved discussions (waiting on @agnesLeroy, @IceTDrinker, and @tmontaigu)
tfhe/src/high_level_api/keys/server.rs
line 295 at r12 (raw file):
Previously, tmontaigu (tmontaigu) wrote…
The type should not be needed,
Indeed! Thanks!
1e8a0a6
to
cc640e9
Compare
cc640e9
to
4c15de8
Compare
@agnesLeroy I just rebased this branch. Hope we can merge it soon. |
4c15de8
to
15eb90a
Compare
looks like there are some broken imports |
15eb90a
to
cf38e44
Compare
cf38e44
to
0688a57
Compare
Looks like I forgot to fetch before rebasing this morning. Now it is fixed. |
0688a57
to
dfa8a2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 15 of 20 files reviewed, all discussions resolved (waiting on @IceTDrinker and @tmontaigu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any more reserves on this, to you @tmontaigu
Reviewed 3 of 6 files at r12, 2 of 4 files at r13, 5 of 5 files at r15, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @tmontaigu)
tfhe/src/high_level_api/compact_list.rs
line 345 at r9 (raw file):
Previously, pdroalves (Pedro Alves) wrote…
Great! Thanks a lot. I just replaced it with your code.
Will leave @tmontaigu to check he is satisfied
tfhe/src/integer/gpu/zk/mod.rs
line 462 at r9 (raw file):
Previously, pdroalves (Pedro Alves) wrote…
@IceTDrinker We discussed your suggestion this morning.
I think a
decompress_from_cpu
might be a lot of work. The only way I've found to avoid that ksk copy insideverify_and_expand
was by adding a ksk material to CudaServerKey and then to use it to initialize the final ksk with it when needed, which does not imply on a copy. I'm afraid a single entry point like that, which receives a CPU's ksk and returns the final cuda ksk, may end up on Cargo complaining that I'm trying to return a pointer to a local variable (which would be the material ksk created inside of that method) or to force me clone the ksks once again.I would also prefer a single entry point for that, but probably that's too much work for this (already too big) PR. We may work on that on a following PR if you wish.
no problem
And thanks a lot Pedro, big PR lots of code! |
- includes documentation about GPU's accelerated expand on the HL API - rework CudaKeySwitchingKey - Cloning the key is no longer necessary on the HL API
dfa8a2a
to
7dd45a4
Compare
closes: https://github.com/zama-ai/tfhe-rs-internal/issues/986
PR content/description
Check-list:
This change is