Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aead: add AeadCore::TAG_POSITION and move AeadInPlace methods to AeadInOut #1798

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

newpavlov
Copy link
Member

@newpavlov newpavlov commented Mar 17, 2025

Closes #1776

@newpavlov newpavlov requested a review from tarcieri March 17, 2025 14:51
@newpavlov newpavlov changed the title aead: mergeall AEAD traits into one Aead trait aead: merge all AEAD traits into one Aead trait Mar 17, 2025
@newpavlov newpavlov mentioned this pull request Mar 17, 2025
@tarcieri
Copy link
Member

tarcieri commented Mar 18, 2025

This PR would be easier to review if it did only what the title suggests, combining all of the traits into a single trait. However it also adds a bunch of additional methods, which can be done separately, potentially to the AeadInOut trait instead of the Aead trait.

I don't like this approach because it limits the flexibility of the Aead trait, which can currently be used to wrap APIs which perform heap allocations. In general I think we have drifted from having traits which are simple and straightforward mappings of the underlying cryptographic functions to a trait-based interface to ones which are encumbered with a lot of low-level implementation details which limit their flexibility, something I feel runs afoul of the crypto coding rules about mixing abstraction details in the same API layer. IMO other languages which provide interfaces/traits tend to provide simple, straightforward mappings.

While there's something to be said for "one trait to worry about", this PR replaces the "which trait do I use?" problem with "which method do I use?". And I don't think you'll achieve any greater flexibility without multiple traits anyway. If you do try to add other traits (to support heap-allocated use, and/or async use), they can't build off Aead, so you won't have access to any of the constants it defines.

I think you could probably get most of the same benefits by combining just the AeadInPlace and AeadInOut traits and adding an associated constant for the tag position to AeadCore, which wouldn't limit the flexibility of the Aead trait (which would effectively undo #1714, there and back again). Then you would have two traits: a high-level one and a low-level one, which would avoid confusing users of the high-level API with the many options you want the low-level API to provide.

@baloo
Copy link
Member

baloo commented Mar 18, 2025

Made an attempt at porting the consumers over in RustCrypto/AEADs#674.

I'm not sure how to deal with the SIVs crates as they do not really have a tag serialization?

@tarcieri
Copy link
Member

tarcieri commented Mar 18, 2025

@baloo aes-siv is prefix, aes-gcm-siv is postfix (really aes-siv is the only prefix crate and the rest are postfix)

@newpavlov
Copy link
Member Author

newpavlov commented Mar 18, 2025

However it also adds a bunch of additional methods, which can be done separately, potentially to the AeadInOut trait instead of the Aead trait.

Ok, I removed the helper methods for now.

I don't like this approach because it limits the flexibility of the Aead trait, which can currently be used to wrap APIs which perform heap allocations.

Could you provide concrete examples of such wrapping? I think we should prioritize pure Rust software implementations foremost. Compatibility with other use cases such as hardware-based accelerators on embedded devices or external IO-based tokens could be a nice bonus, but we should carefully consider concrete examples without hypotheticals.

As I mentioned in the OP, such use cases may be better covered by a separate higher-level dyn-friendly trait, i.e. something similar to the old Aead trait, but which would not require fixing tag and nonce sizes in dyn contexts.

I also think that it should be possible to emulate (albeit somewhat inefficiently) the detached inout APIs even on top of such heap allocating APIs. But I may reconsider after analyzing your examples.

I think we have drifted from having traits which are simple and straightforward mappings of the underlying cryptographic functions to a trait-based interface to ones which are encumbered with a lot of low-level implementation details which limit their flexibility

I disagree. I think the suggested approach clearly follows the idiomatic handling of "extension" methods commonly used in Rust. We have "fundamental" low-level detached methods and associated constants which represent "simple and straightforward mappings of the underlying cryptographic functions". On top of them we build convenient extension methods with an ability to overwrite blanket implementations if it's necessary for some reason. For example, compare it against the Iterator or io::Read/Write traits.

Now, there is an alternative approach of "extension traits" (e.g. RngCore vs. Rng), but it's usually used either when traits are defined in separate crates, or when we do not want users to be able to overwrite the extended implementation. But as you insisted in the previous discussions, this ability may be important for efficient support of prefix modes with hypothetical offsetting encryption.

I think you could probably get most of the same benefits by combining just the AeadInPlace and AeadInOut traits and adding an associated constant for the tag position to AeadCore, which wouldn't limit the flexibility of the Aead trait

It's very close to what I propose. The main difference is that I think that instead of the current Aead trait we should provide a new DynAead trait which would work with nonce: &[u8]. In other words, I don't think this high-level trait should be a super trait of AeadCore, considering that in its methods it does not expose any information about tag position or its size.

If you really don't like the suggested names, instead of Aead/DynAead we could use AeadCore/Aead or something else.

I did not include this new trait as part of this PR because I assume you would like to see it added in a separate PR.

While there's something to be said for "one trait to worry about", this PR replaces the "which trait do I use?" problem with "which method do I use?".

From my teaching experience, one of the common stumbling blocks for students is understanding hierarchy of blanket trait impls (and it sometimes applies even to mildly experienced Rust programmers). It's very easy to look at crate docs, see that a type implements trait Foo, click on it, see all Foo methods with a clear distinctions between required and provided methods, and chose a method suitable for a problem at hand. It's much harder to discover that "there is a blanket impl of Bar for T: Foo, my type implements Foo, then it also implements Bar through the blanket impl, and I need to use Bar::bar for my problem".

I understand that I may not be the best person to argue this point considering the APIs in our other crates, but I try to be conscious about it.

@tarcieri
Copy link
Member

tarcieri commented Mar 18, 2025

It's very close to what I propose. The main difference is that I think that instead of the current Aead trait we should provide a new DynAead trait which would work with nonce: &[u8].

That means if you want to wrap anything which implements an AEAD with heap allocations, you lose type safety around the nonce size. Forcing the inout API on everything seems needlessly inflexible.

It's also a departure from the design of crates like crypto-common where sizes are provided on their own traits, or digest where Digest provides a high-level API and is primarily impl'd using blanket impls.

From my teaching experience, one of the common stumbling blocks for students is understanding hierarchy of blanket trait impls (and it sometimes applies even to mildly experienced Rust programmers). It's very easy to look at crate docs, see that a type implements trait Foo, click on it, see all Foo methods with a clear distinctions between required and provided methods, and chose a method suitable for a problem at hand. It's much harder to discover that "there is a blanket impl of Bar for T: Foo, my type implements Foo, then it also implements Bar through the blanket impl, and I need to use Bar::bar for my problem".

I think there's been a lot of headway on that, some of which we aren't leveraging but probably should, namely diagnostic::on_unimplemented (#1563).

rustdoc also enumerates the blanket impls for a given type which makes them much more discoverable.

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

That means if you want to wrap anything which implements an AEAD with heap allocations, you lose type safety around the nonce size.

And I don't think that we should have it in a dyn-friendly trait, similarly to how we do not have type safety around the output size in DynDigest. With the current API you are not be able to simultaneously generalize over ChaChaPoly1305 and AesGcm with 96 bit nonces.

Forcing the inout API on everything seems needlessly inflexible.

How it's different from the cipher crate? If anything, I think it makes our APIs more flexible for users, since they now can rely on explicit buffer-to-buffer operations where suitable. In cases where a proper implementation of inoutAPIs is not possible for some reason, we always can fall back to copying data from input to output buffer in the case of disjoint buffers. It may be even worth to add helper methods for that to inout (see RustCrypto/utils#1169).

It's also a departure from the design of crates like crypto-common where sizes are provided on their own traits, or digest where Digest provides a high-level API and is primarily impl'd using blanket impls.

I had in mind potential migration to the OutputSizeUser and IvSizeUser traits (maybe after renaming the latter to NonceSizeUser), but I left it for later PRs according to your desire to minimize changes in each PR.

As for Digest, your forget about DynDigest, the example of which I try to follow with DynAead.

I think there's been a lot of headway on that, some of which we aren't leveraging but probably should, namely diagnostic::on_unimplemented

I don't think it helps that much. on_unimplemented helps when you encounter a compilation error, while I was talking about API discovery while browsing crate docs. The "Blanket Implementations" section is certainly a great improvement, but it's still significantly less visible.

The merged trait would also work much better with hypothetical inherent trait implementations (which I really hope we will get one day).

@tarcieri
Copy link
Member

tarcieri commented Mar 19, 2025

If anything, I think it makes our APIs more flexible for users, since they now can rely on explicit buffer-to-buffer operations where suitable.

They could still do that if you impl'd those same methods on AeadInOut.

In cases where a proper implementation of inoutAPIs is not possible for some reason, we always can fall back to copying data from input to output buffer in the case of disjoint buffers.

Which is a lot less suboptimal than simply being able to return the Vec you already have.

As for Digest, your forget about DynDigest, the example of which I try to follow with DynAead.

No, that's not what I was talking about. I was talking about how Digest is primarily (exclusively?) impl'd via this blanket impl:

https://docs.rs/digest/latest/digest/trait.Digest.html#impl-Digest-for-D

Are you going to similarly get rid of the FixedOutput, Update, and HashMarker traits and just have Digest, and if not, why not?

I think before we try refactoring aead further we should address #1666, which IMO is a much higher priority, and also means any such refactorings won't paint ourselves into a corner making that difficult to implement/use (it may actually work out with a Dyn* API since the goal is to hide the nonce from the user).

I can take a look at that soon.

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

Are you going to similarly get rid of the FixedOutput, Update, and HashMarker traits and just have Digest, and if not, why not?

I would love to do that and long time ago I used roughly this approach.

But we have 3 different algorithm classes: fixed output, extendable output, and variable output. And we also have Hash/MAC, customizable/non-customizable, and resetable/non-resetable distinctions on top of that. We could introduce several different traits (Hash, Mac, VariableHash, VariableMac, ExtendableHash,etc.) instead of "atomic" traits like Update, but then we would lose ability to write generic code over atomic functionality and we either lose some flexibility around corner case or get a combinatorial explosion of various trait variants.

In the aead case we simply do not have anything like that. The only hypothetical problem is with hardware tokens and you did not provide a concrete example of it yet.

If you have a concrete proposal for digest in this vein, I am open to discussing it in a separate issue.

Which is a lot less suboptimal than simply being able to return the Vec you already have.

If user already has Vec, then he would use the higher level Buffer-based methods, which can be implemented "optimally" by overwriting the default impls. The existence of inout methods simply does not matter in such cases.

I think before we try refactoring aead further we should address #1666

This looks orthogonal to me. I have some ideas which I plan to prototype on top of this refactor, which would look like additional provided methods on the merged Aead and the new DynAead traits.

They could still do that if you impl'd those same methods on AeadInOut.

You haven't addressed my point about DynAead (or the current Aead) not needing type safety around nonce and tag sizes. If Aead is no longer a super-trait of AeadCore, then there is zero reason to have split between AeadCore and AeadInOut.

@tarcieri
Copy link
Member

tarcieri commented Mar 19, 2025

In the aead case we simply do not have anything like that. The only hypothetical problem is with hardware tokens and you did not provide a concrete example of it yet.

No, I gave the example of anything which implements an AEAD with heap allocations, i.e. anything where the AEAD implementation owns the return buffer and not the user. For example, that's what the sodiumoxide API looks like (used to look like anyway, the crate is defunct). Anything which actually looks like the current Aead trait.

Your solution for these is to introduce a DynAead, which means to wrap those implementations you have to give up type safety around the nonce size.

You haven't addressed my point about DynAead (or the current Aead) not needing type safety around nonce and tag sizes.

That's nice if a DynAead is what you actually want. I haven't seen anyone request a DynAead. But you also seem to be using it as a solution to fill some gaps left over form this change which it doesn't optimally fill, as I just noted in regard to not being able to wrap AEAD implementations which return their results allocated on the heap, where IMO losing type safety around nonce sizes is an antifeature.

This PR is a one-sized-fits-some solution, which IMO is a regression from the current state of the crate with dubious benefits.

The existing implementation is more flexible, you can still add a DynAead if you so desire, and you can still add additional methods to AeadInOut.

This looks orthogonal to me. I have some ideas which I plan to prototype on top of this refactor, which would look like additional provided methods on the merged Aead and the new DynAead traits.

Those methods are going to work on differently structured messages, and you're going to mix them in with the rest of the Aead methods. I think it would make more sense for those methods to logically be grouped together on their own trait.

Perhaps this is the main place we disagree: I want methods organized and grouped by related functionality, and you seem to think it is easier and less confusing for users for them to all be placed into a single trait.

IMO having lots of methods on a single trait makes it much harder for users to answer the "what method do I use?" question.

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

For example, that's what the sodiumoxide API looks like (used to look like anyway, the crate is defunct).

Heh, this is an amusingly bad example. After digging a bit deeper, you can see that the underlying library uses... buffer-to-buffer APIs! (I am not sure whether the library allows for m and c pointers to point to the same memory, but probably it does) The fact that the defunct wrapper uses heap allocations for convenience and simplicity sake should not mean anything for designing our APIs.

This PR is a one-sized-fits-some solution, which IMO is a regression from the current state of the crate with dubious benefits.

Show me a concrete example of an implementation which implements the current Aead trait, but can not fundamentally implement the detached API.

I think it would make more sense for those methods to logically be grouped together on their own trait.

Do you think the same about the Iterator trait? You may not like it, but it's the de facto idiomatic approach which is used widely in the ecosystem.

Putting the methods into different traits tied with blanket impls not only makes discovery a bit harder (it's objectively easier to read and search docs for one trait, than for several), but also would prevent us from overwriting default impls (assuming we do not have specialization), which may be useful for optimization purposes.

@tarcieri
Copy link
Member

tarcieri commented Mar 19, 2025

Heh, this is an amusingly bad example. After digging a bit deeper, you can see that the underlying library uses... buffer-to-buffer APIs!

It didn't used to. This used to be all there was:
https://docs.rs/sodiumoxide/0.0.9/sodiumoxide/crypto/secretbox/xsalsa20poly1305/fn.seal.html

Yes newer versions added the buffer-to-buffer APIs.

Do you think the same about the Iterator trait? You may not like it, but it's the de facto idiomatic approach which is used widely in the ecosystem.

Iterator is both an extremely complex core API of the language and it's also decomposed into many, many traits which are grouped by logical functionality (I guess you could also call it an "amusingly bad example")

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

It didn't used to. This used to be all there was:

Please, look under the hood... By clicking "source" in the linked page you can see this:
https://docs.rs/sodiumoxide/0.0.9/src/sodiumoxide/.cargo/registry/src/github.com-1ecc6299db9ec823/sodiumoxide-0.0.9/src/crypto/secretbox/xsalsa20poly1305.rs.html#62-72

ffi::crypto_secretbox_xsalsa20poly1305(dst, src, len, n, k)

Which is again a buffer-to-buffer API. I understand why the wrapper authors rely on allocations here, but it's simply not relevant for us.

Iterator is both an extremely complex core API of the language and it's also decomposed into many, many traits which are grouped by logical functionality

These traits are either about conversions (FromIterator/IntoIterator) or extension of the Iterator functionality, e.g. an implementor of the base Iterator trait simply can not have a generic implementation of DoubleEndedIterator::next_back. For example, in the aead case an analog of DoubleEndedIterator would be an additional extension trait for key committing AEADs.

What you argue is for 70+ methods from the Iterator trait to be moved into separate traits "grouped by related functionality". So, no, it's a perfectly good example until you provide a proper example of:

an implementation which implements the current Aead trait, but can not fundamentally implement the detached API

@tarcieri
Copy link
Member

tarcieri commented Mar 19, 2025

Please, look under the hood.

I'm talking about the public API. That's how it was implemented even if there were buffer-to-buffer ops available internally which were eventually exposed.

Anyway, that was just an example from the Rust ecosystem I could think of offhand.

For example, in the aead case an analog of DoubleEndedIterator would be an additional extension trait for key committing AEADs.

Or how about methods for operating on nonce-prefixed AEAD messages, which you just want to lump into the Aead trait?

What you argue is for 70+ methods from the Iterator trait to be moved into separate traits "grouped by related functionality"

No, that's not what I'm arguing, I was pointing out how Iterator is already structured. You however, seem to be arguing "Iterator has a lot of methods, therefore all traits should have a lot of methods"

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

I'm talking about the public API. That's how it was implemented even if there were buffer-to-buffer ops available internally which were eventually exposed.

Those public APIs were developed without relying on aead, so obviously the authors can take different tradeoffs and approaches to designing crate APIs. Fundamentally, there are no issues in wrapping libsodium in the aead APIs proposed in this PR. Yes, we may not be able to use the sodiumoxide crate for that, but I don't see a problem in that.

Or how about methods for operating on nonce-prefixed AEAD messages, which you just want to lump into the Aead trait?

All AEADs can implement nonce-prefixed serialization of ciphertexts based on the detached methods, similarly to how all Iterators can implement try_fold or map. So, no, it's not the same.

No, that's not what I'm arguing, I was pointing out how Iterator is already structured. You however, seem to be arguing "Iterator has a lot of methods, therefore all traits should have a lot of methods"

You wrote:

Those methods are going to work on differently structured messages, and you're going to mix them in with the rest of the Aead methods. I think it would make more sense for those methods to logically be grouped together on their own trait.

By this logic Iterator::copied should be in a different trait because it works with Copy items. Or min/max, min/max_by, min/max_by_key, etc. should be in a different trait because they work with "orderable items".

My point is that there should be a tangible reason for a trait split. It can be either that we define an extension trait in a separate crate (itertools::Itertools), or that there are types which implement only subset of the functionality (types which implement Iterator, but not DoubleEndedIterator). Thus using traits to just "group methods" is a non-idiomatic approach as demonstrated by a number of std traits. Another example is Read::read_to_end and Read::read_to_string methods, by your logic they should be defined in separate traits (ReadVec/ReadString?) because they work with heap allocation and UTF-8 encoded strings ("differently structured messages") instead of low-level byte slices, right?

@tarcieri
Copy link
Member

I'm talking about an extension trait that works for all AEADs. Using an extension trait makes it clear which methods operate on nonce-prefixed AEAD messages as opposed to unprefixed messages.

I think solving that is a higher priority over what is effectively a bikeshedding debate about the API where either option has tradeoffs, and those requirements should also inform the design.

Also, personally I don't think these "analogies" with std are helpful, and the comparisons you are making are simply not apt. Iterator is not precedent for simply dumping everything into a single trait (as I've established, that's not how Iterator even works), and there is nothing like a Copy bound at play. We're not designing Iterator here. Iterator is vastly more complex.

@newpavlov
Copy link
Member Author

newpavlov commented Mar 19, 2025

as I've established, that's not how Iterator even works

No, you absolutely did not do that. The linked std::iter traits fall under the "existence of types which implement only subset of the functionality" rule. Point me to a trait which is implemented for all Iterators and defined as part of std. And I pointed you to io::Read/Write traits which follow the same principle. AFAIK none of the std traits are introduced to "group methods" like you want to do with the in-crate extension traits.

The same principle more or less applies to the wider ecosystem as well, the mentioned above itertools crate "dumps" 130 methods into the Itertools trait instead of splitting them into a dozen of separate traits.

I think solving that is a higher priority over what is effectively a bikeshedding debate about the API where either option has tradeoffs, and those requirements should also inform the design.

So let's move forward with this PR and start working on those higher priority tasks. If in future you will show me a compelling reason why the split matters (e.g. during development of the nonce including methods), I will revert it myself in no time.

@tarcieri
Copy link
Member

tarcieri commented Mar 19, 2025

So let's move forward with this PR and start working on those higher priority tasks.

I'd rather move forward with the higher priority work first, then see what it looks like when you try to shoehorn everything into the Aead trait

@newpavlov
Copy link
Member Author

Sigh..... Ok, I will return the AeadCore/AeadInOut/Aead split to unblock this. But I certainly plan to return to this.

@newpavlov newpavlov changed the title aead: merge all AEAD traits into one Aead trait aead: add AeadCore::TAG_POSITION and move AeadInPlace methods to Aead Mar 19, 2025
@newpavlov newpavlov changed the title aead: add AeadCore::TAG_POSITION and move AeadInPlace methods to Aead aead: add AeadCore::TAG_POSITION and move AeadInPlace methods to AeadInOut Mar 20, 2025
@@ -155,6 +160,94 @@ pub trait AeadCore {
}
}

/// In-place AEAD trait which handles the authentication tag as a return value/separate parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inout operations aren't necessarily in place. This probably needs an all-new description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aead: use IS_POSTFIX associated constant instead of the PostfixTagged marker trait?
3 participants