Skip to content

Stable Rust changes #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

sellout
Copy link
Collaborator

@sellout sellout commented Apr 15, 2025

This contains a number of changes to the Rust implementation that maintain stepwise identical behavior with the C++ implementation.

It should be reviewed commit-by-commit.

@sellout
Copy link
Collaborator Author

sellout commented Apr 17, 2025

This PR can potentially have some parts rejected, so if any of the commits (or smaller parts) seem unnecessary, we can discuss. I don’t always do a good job of including justification in the commit messages.

E.g., the change to have PushValues carry their data simplifies representation of the script as Vec<Opcode>, which is useful in cases like error reporting (as we can include the script as opposed to opaque bytes) and PCZTs (which can accept a Vec<Opcode>, which is then serialized into the tx).

Copy link
Collaborator

@nuttycom nuttycom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing comments; I still need to spend some more time with 05ba17b to make sure I've checked everything.

*
* This function is consensus-critical since BIP66.
*/
fn is_valid_signature_encoding(sig: &[u8]) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified that ecdsa::Signature::from_der performs these checks, with the exception that I could not find the minimum and maximum size checks. @sellout where is the overall length check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min/max check is redundant – it’s a consequence of the other tags & encoded lengths. In particular, it’s a consequence of the r and s lengths each being in the range [1, 32].

I think this removed code never checked that the the lengths are ≤32, but rust-secp256k1 does. So, there was actually a shortcoming here (AFAICT) where, for example, there could have been an r_len of 48 and an s_len of 16, and everything would have passed.

};

// A signature is of type 0x30 (compound).
if sig[0] != 0x30 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -307 to -310
if sig[0] != 0x30 {
return false;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -312 to -315
if usize::from(sig[1]) != sig.len() - 3 {
return false;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum PushValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rustdoc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any linter that tells you what declarations aren’t documented?

}

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum Operation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs top-level rustdoc; basically, all the public functions and types do.

src/scriptnum.rs Outdated
const DEFAULT_MAX_NUM_SIZE: usize = 4;

pub fn new(
vch: &Vec<u8>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could take a &[u8] instead?

src/scriptnum.rs Outdated
Self(
self.0
.checked_add(other.0)
.expect("caller should avoid overflow"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the caller has to check, why not make the Add and Sub impls return type Output = Result<Self, ...>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These operations went away with #208.


use operation::{Control, Normal};
use push_value::{LargeValue, SmallValue};

/** Script opcodes */
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Deserialize, Serialize)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder whether it doesn't make more sense to write a custom Deserialize and Serialize that uses the canonical encoding, instead of implicitly using a new (derived) serde encoding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that definitely makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, actually, LargeValue puts a wrinkle in this, as it’s not representable by u8.

We could have Operation (including Control, Normal, and Unknown) and SmallValue map to u8, but LargeValue (and consequently PushValue and Opcode) would need something richer ([u8]?). And then the larger script types definitely to [u8].

@mpguerra mpguerra added this to Zebra May 8, 2025
@mpguerra mpguerra moved this to Review/QA in Zebra May 8, 2025
@sellout sellout force-pushed the stable-rust-changes branch 3 times, most recently from 6d6ad7f to 1a4c5c2 Compare May 15, 2025 06:24
@sellout sellout force-pushed the stable-rust-changes branch 4 times, most recently from 3064d79 to fd74615 Compare May 19, 2025 20:32
@conradoplg conradoplg mentioned this pull request May 19, 2025
@sellout sellout force-pushed the stable-rust-changes branch 3 times, most recently from 50c70be to 44109e5 Compare May 20, 2025 22:19
@sellout
Copy link
Collaborator Author

sellout commented May 20, 2025

Sorry, there has been a lot of churn here after addressing @nuttycom’s comments. Lesson learned – this PR is just too big. I ended up removing the no_std changes, because they weren’t as trivial as I initially thought.

Everything is passing now, though, so should be good.

@sellout sellout marked this pull request as draft May 22, 2025 21:00
@sellout
Copy link
Collaborator Author

sellout commented May 22, 2025

I am splitting this PR into multiple smaller ones. So far I’ve split out four preliminary PRs, reducing this PR from 36 commits to 19.

graph LR;
  PR-221 --> PR-223;
  PR-222 --> PR-223;
  PR-223 --> PR-224;
  PR-224 --> this;
Loading

@sellout sellout force-pushed the stable-rust-changes branch from 44109e5 to 020e72a Compare May 22, 2025 22:49
@mpguerra mpguerra removed the status in Zebra Jun 16, 2025
@mpguerra mpguerra removed this from Zebra Jun 16, 2025
@nuttycom nuttycom added this to the Zcashd wallet replacement milestone Jun 23, 2025
sellout added 2 commits July 11, 2025 14:03
The C++ implementation had a lot of manual checking of bytes. In Rust,
we have the secp256k1 crate, which takes care of most of this.
sellout added 11 commits July 15, 2025 11:58
On the C++ side, this is the value of the error output parameter when
script validation is successful. It never occurs on the Rust side
(because we have `Result`), and so we can remove it from the enumeration
without an consequences.
Previously, the `TestVector`s held normalized results, and we would
normalize the _actual_ result before comparison.

This changes the `TestVector`s to hold richer Rust results, and then to
normalize the _expected_ result only for the C++ case.
This splits `Operation` into three separate enums – `Control` (for
if/else, which get executed regardless of `vexec` and disabled
operations, which always fail, regardless of `vexec`), `Normal`, for
operations that respect `vexec`, and `Unknown` for undefined opcodes
which only fail if they’re on an active branch. This is done so that the
evaluator can be split on the same lines.

__NB__: I strongly recommend ignoring whitespace changes when reviewing
        this commit.

(cherry picked from commit f97b92c)
(cherry picked from commit 27a5037)
This parallels the existing `op` module, and is used in cases where we
want to guarantee that only push values are used.

`op` itself has been updated to reference `pv`, rather than using the
constructors directly.
This introduces one edge case: If a disabled opcode is the 202nd
operation in a script, the C++ impl would return `Error::OpCount`, while
the Rust impl would return `Error::DisabledOpcode`.
Having `Control` and `Normal` grouped under `Operation` only
eliminated one conditional (checking whether we hit the `op_count`
limit, and that is now better abstracted anyway), and it introduced a
lot of code (see the 55 lines removed here _plus_ the many nested calls
removed, as in op.rs).

So, `Normal` is now called `Operation`. The confusing label of “control”
for some `Operation` (née `Normal`) opcodes has been removed.

(cherry picked from commit 1c98bb3)
sellout added 15 commits July 16, 2025 14:22
Well, my _tiny_ edge case of “only if the 202nd operation is a disabled opcode” didn’t slip past the
fuzzer. It caught that pretty quickly.

So, this does a better job of normalizing errors for comparisons.

First, it normalizes both the C++ and Rust side, which allows the Rust error cases to not be a
superset of the C++ error cases. Then, it also normalizes errors in the stepwise comparator (which I
think was done in ZcashFoundation#210, but it’s reasonable to do along with these other changes).

Also, `ScriptError::UnknownError` has been replaced with `ScriptError::ExternalError`, which takes
a string description. This is used for two purposes:

1. “Ambiguous” cases. One was already done – `UNKNOWN_ERROR` on the C++
   side with `ScriptError::ScriptNumError` or `ScriptError::HighS` on
   the Rust side, but now it’s handled differently. The other is the
   edge case I introduced earlier in this PR – Rust will fail with a
   `DisabledOpcode` and C++ will fail with a `OpCount`, so those two
   cases have been unified.
2. Errors that occur inside a stepper. Previously, this just melded in
   by returning `UnknownError`, but that was just the “least bad”
   option. Now we can distinguish these from other `ScriptError`.
This makes it easier to write chunks of Zcash Script in code, with some
higher-level patterns of opcodes to use. The serialization is also
necessary for wallets to produce scripts.

It also removes some of the duplication in the tests, by moving
constants to `crate::testing`.
It now calls `PushValue::from_slice`, which is more correct and user-
visible.
This required changing the `result` type of `TestVector`, because now
there are some values that can’t be constructed directly, so we store
those in normalized form.
This should move things around without making any changes to the implementation (other than updating
names). Here is what changed:

- `script::Opcode` is now `::Opcode`
- `Control` `Disabled`, `Operation`, and `PushValue` have moved from `script` to `opcode`
- `LargeValue` and `SmallValue` have moved from `script` to `opcode::push_value`
- `LOCKTIME_THRESHOLD` maved from `script` to `interpreter`
- `script::Script` is now `script::Code` (this is in anticipation of typed
  scripts, which introduces a `struct Script<T> { sig: script::Sig<T>, pub_key:
  script::PubKey }`)
- `script::parse_num` is now `num::parse`
- `script_error::ScriptNumError` is now `num::Error`
- `script::MAX_SCRIPT_ELEMENT_SIZE` is now `opcode::PushValue::MAX_SIZE`
- `script::serialize_num` is now `num::serialize`
- `script::MAX_SCRIPT_SIZE` is now `script::Code::MAX_SIZE`
- `script::Script::get_op` is now `::Opcode::parse`
- `script::Script::get_lv` is now `opcode::push_value::LargeValue::parse`
This is needed by the PCZT crate
With stronger types come fewer errors. This prepares the types for cases
when different public API calls have different possible errors. Notably,
this separates lexing errors from interpreter ones).
This is one more step toward type-safe scripts.
Store them as their byte representations, rather than serde’s defaults.
`SignatureChecker` shouldn’t have had default implementations – those
really belong to the `BaseSignatureChecker`.
And require that future ones are documented as well.
The `Stack` operations defined here already return `InvalidStackOperation` when they fail, so many
of the explicit `stack.is_empty()` checks are redundant.

However, removing them is delicate. Basically, they can only be removed if there is no other change
to the state that can occur before the `Stack` operation would  hit the same case. E.g., in
`OP_2DROP`, it can’t be removed because if there is one element on the stack, it’ll be removed
before the error is triggered, leaving the stack in different states between C++ and Rust.

Similarly, `OP_WITHIN` _can_ have the check removed, but the `pop`s can’t replace the `rget`s
because we might fail after the first `pop`, which would again leave us in a different state than
C++ which reads all the values before popping any of them.

Keeping the behavior identical to the C++ isn’t important per se, but it makes validating and
debugging changes easier, as we expect them to have the same state in every case.
@sellout sellout force-pushed the stable-rust-changes branch from 020e72a to 5f9b7d1 Compare July 21, 2025 17:30
It holds a full script – both sig & pubkey, in its un-parsed
representation.

This (along with `script::Code`) is also now used to represent scripts
in the public interface. No more `&[u8]`.
@sellout sellout force-pushed the stable-rust-changes branch from 5f9b7d1 to bcfbdab Compare July 21, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants