Skip to content

Conversation

@Kixunil
Copy link
Collaborator

@Kixunil Kixunil commented Apr 4, 2025

This includes the relevant stuff from #172 along with the changes requested in the review.

Kixunil added 2 commits April 4, 2025 08:47
Despite being maintained by rust-bitcoin developers this library is
general-purpose. There's nothing specific to bitcoin in this library
and, except for reversed display of hashes, there's nothing special
about hex usage in Bitcoin. (And reversing doesn't need special support
from this library.)
Githooks have nothing to do with MSRV so information about them should
not be sub-section of MSRV.
@Kixunil Kixunil added documentation Improvements or additions to documentation enhancement New feature or request 1.0 Issues and PRs required or helping to stabilize the API port-1.0.0-alpha Needs porting to the alpha release branch labels Apr 4, 2025
@Kixunil Kixunil requested a review from Copilot April 4, 2025 16:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

@Kixunil Kixunil force-pushed the 1.0-cleanups branch 4 times, most recently from 90cec1e to 579bdff Compare April 4, 2025 18:54
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice PR, appreciate the effort you went to. A couple minor docs things only. I did not review the grammar of docs too closely. I think we can do a final parse of docs during the rc cycle.

README.md Outdated
Comment on lines 40 to 41
Note though that the dependencies may have looser policy. This is not considered breaking/wrong
- you would just need to pin them in `Cargo.lock` (not `.toml`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders as a list because the - is at the start of the line

Seen here: https://github.com/Kixunil/hex-conservative/blob/1.0-cleanups/README.md

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL! Nice catch

src/error.rs Outdated
Comment on lines 73 to 74
/// This error differs from `DecodeFixedSizedBytesError` in that the number of bytes is only known at run time
/// - e.g. when decoding `Vec<u8>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders as a list item because of the - at the start of the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a link for DecodeFixedSizedBytesErro after 'differs from' since reader is likely to want to see the DecodeFixedSizedBytesError when reading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes both things

/// This error differs from [`DecodeDynSizedBytesError`] in that the number of bytes is known at
/// compile time - e.g. when decoding to an array of bytes.

@tcharding
Copy link
Member

With the docs fixes:

ACK 579bdff

Kixunil added 8 commits April 5, 2025 08:55
The README claims that we have a conservative MSRV policy but doesn't
specify how. This commit adds clarification how we intend to bump the
MSRV.
We'd like to limit the API in 1.0 to the decoding functions in crate
root and omit the `FromHex` trait to make the API smaller and easier to
maintain. Because the trait won't be in 1.0, the functions should not
call into it.

This moves the implementation of decoding from the trait methods to
free-standing functions and calls those functions inside trait methods.
The functions `decode_vec` and `decode_array` sounded like they are
decoding *from* those objects rather than *to* them. Similar crates that
have such free-standing functions refer to the source in their name but
we refer to destination.

This renames them so that they contain `_to` making it obvious it's the
destination not the source.
The messages produced by `Display` impl will be read by the end users
who may not know anything about programming. This change makes sure to
style them nicely and explain more clearly what is the problem with the
input.
The error types in the form of `Outer(pub(crate) Inner)` where `Inner`
needs to be public and accessible anyway have a few problems:

* Understanding two types is harder than understanding just one type
* They are more annoying for the consumers to match on
* They take up space in documentation
* Naming them is difficult

Hex decoding should be in principle easy and all possible error types
are fixed, so there shouldn't be future changes. Adding a common field
can still be done simply by adding it to each variant. This even has the
advantage that newly added field is not lost during some conversions
(e.g. if downstream crate knows the length is correct). If fast access
is needed we can just arrange all structs to place the common fields at
the same offset (e.g. beginning).

So I only see two cases where modifying them would be needed:

* Adding a `PhantomData` field - this is highly unlikely in this crate,
  there's no context. We could have the char type generic in the future
  (`char`, `u8`, `core::ascii::Char`) but that would be stored inside
  `InvalidCharError`, so no need for `PhantomData`.
* Making the error boxed internally - we can also just box the inner
  errors individually and it'll only cost an extra `usize` for
  discriminant making the entire error type two `usize`s. This is
  already smaller than `Vec`, so returning it from `decode_to_vec` is
  non-issue and callers can add another layer of box if needed. It is
  worse when returning short arrays but those seem to be rare.

I think that we can do a lot of layout fiddling and the compiler can do
some too, so I think it'll work fine. And any application that is so
performance-sensitive that 1 `usize` is going to make a difference
should probably not parse hex to begin with.

Thus I believe it's worth just exposing the errors and this commit
implements it.
We want to eventually return `char` instead so this hides the method
until that is fixed.
When a larger string is parsed that may consist of multiple parts, some
of them being hex strings, the returned errors should carry the position
relative to the start of the entire string. The design of errors so far
required users to create a new type which would either carry the
position of the hex string or lose some information by converting from
our types. This includes a problem that if a different crate in the tree
enables a feature to find all invalid characters the calling crate has
no way to know about it and has to lose the remaining characters.

This can be solved by providing a method to modify all the positions
stored inside the error type. Such method should be added ASAP so that
all libraries can start using it right away. Thus we want to include
this before 1.0 release.

This commit adds the method, taking into account that it will be used
inside `map_err` quite very often.
Using `not(feature = "std")` makes modifications frustrating because the
feature internally behaves as if it wasn't additive leading to complex
conditions. It's better to just set `#[no_std]` unconditionally and then
optionally reference the additional features.
@Kixunil
Copy link
Collaborator Author

Kixunil commented Apr 5, 2025

I fixed the docs as suggested.

@Kixunil Kixunil requested a review from Copilot April 5, 2025 07:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

tests/api.rs:89

  • [nitpick] Consider using more descriptive field names in the Errors struct instead of single letters (e.g., 'hex_array_error' instead of 'c') to improve clarity in test output.
struct Errors {

src/error.rs:241

  • [nitpick] The custom Display implementation for InvalidCharError uses nested helper structures to adjust the output and could be simplified. Consider extracting the formatting logic into a separate helper to improve readability.
write!(f, "the {} character, {}, is not a valid hex digit", which, chr)

Cargo.toml:30

  • Ensure that the addition of the 'newer-rust-version' feature and its dependency is well documented and covered by tests to verify that feature detection behaves as expected under various toolchain versions.
newer-rust-version = ["dep:if_rust_version"]

Kixunil added 3 commits April 5, 2025 09:23
We've decided that we want to use the `if_rust_version` crate to avoid
compiling and running the same build script multiple times across many
crates. However, currently the additional features are just about
`core::error::Error` and thus irrelevant for `std` users (majority).
Depending on it unconditionally would just add dependency.Making the
crate optional is a natural solution.

This change imlpements it. The specific crate used is considered an
internal detail and this will even go away once the native solution is
in MSRV. Thus we rename the feature to better convey the meaning. It is
also documented that this feature may or may not add dependency and,
since we have a way of removing it, it's also documented what will
happen once the feature becomes irrelevant.
With upcoming decoding stabilization we need to inform users about
details of the stabilization and make sure our API doc is
understandable. This makes various doc improvements including explaining
how we're going to stabilize things and feature flags.
If a downstream library decodes a hex string into a particular type
using a function in this library the returned error type would "leak"
which type the library used internally. The specific type shouldn't
really concern the callers since it can change anyway. What is relevant
though is whether the length is known at compile time or not.

We might also add more functions in the future decoding to different
types and it'd be better to share error types in such case.

This change renames `HexToBytesError` to `DecodeDynSizedBytesError` and
`HexToArrayError` to `DecodeFixedSizedBytesError` so that types are not
mentioned.
@Kixunil Kixunil requested a review from Copilot April 5, 2025 07:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Files not reviewed (1)
  • contrib/test_vars.sh: Language not supported
Comments suppressed due to low confidence (1)

src/error.rs:211

  • [nitpick] The uninitialized variable 'which' used to build the display output for InvalidCharError can be confusing. Consider refactoring this match arm to inline the creation of the formatted output or using a more descriptive variable name to improve clarity.
let which;

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d73c686

@apoelstra
Copy link
Member

Nice! I'm happy with these names. Am in flight today but will review this now so that it can get into the queue before I'm offline.

Disappointing that copilot is still useless.

@apoelstra
Copy link
Member

apoelstra commented Apr 9, 2025

Will ACK and merge this regardless, but I'd like a followup to

  • rename the newer-rust-version feature to rust-1..81 to make it obvious what "newer" means, which is currently meaningless and will only become more meaningless
  • update the README text which I think is overly targeted at developers of this library
  • Fix the error type names which have Sized in them but which have nothing to do with Sized, and have Dyn in them but have nothing to do with dyn.

I can do this. It will be a much smaller PR than this one.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d73c686; successfully ran local tests

@apoelstra apoelstra merged commit 230492c into rust-bitcoin:master Apr 10, 2025
14 checks passed
@Kixunil Kixunil deleted the 1.0-cleanups branch April 12, 2025 13:26
@Kixunil
Copy link
Collaborator Author

Kixunil commented Apr 12, 2025

Disappointing that copilot is still useless.

It did find two issues without me having to do anything other than clicking. I consider it a win. Maybe it's value is limited but still, I didn't have to wait for you to find those mistakes and fixed them quickly.

  • rename the newer-rust-version feature to rust-1..81 to make it obvious what "newer" means, which is currently meaningless and will only become more meaningless

I disagree with putting specific version in the feature name. Currently it's only about core::error::Error but we might want to use some other feature as well and then the version would be just wrong (or we would need multiple features which then begs the question why have the dependency at all). The real end version is the one which will stabilize #[cfg(lang-version = "...")] one day in the future and we currently don't know which it is.

  • Fix the error type names which have Sized in them but which have nothing to do with Sized, and have Dyn in them but have nothing to do with dyn.

As I explained elsewhere, it is very similar to Sized. But the only other name that comes to my mind is VarLen which is also a bit confusing. Originally I put the rename as the last commit so that it could be trivially dropped from this PR.

@apoelstra
Copy link
Member

I strongly disagree that there is any similarity between Sized and the difference between [u8; N] and Vec<u8> (which are both Sized types), or between Sized and whether the length of a bytestring is known at compile time.

I guess if users were hex-encoding the raw byte patterns of Rust objects (without following any internal pointers) the terms would coincide.

In any case, I'll PR to change the names and we can debate there.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Apr 19, 2025

Vec is a Sized container for !Sized [u8]. But I do appreciate that this is a matter of perspective and it might be confusing.

@apoelstra
Copy link
Member

You can argue that Box<[u8]> is a "Sized container for [u8]", but there's no argument that this is true for Vec<u8>. Vec is a totally different type with different behavior and different underlying construction.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Apr 23, 2025

They seem extremely similar to me. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.0 Issues and PRs required or helping to stabilize the API documentation Improvements or additions to documentation enhancement New feature or request port-1.0.0-alpha Needs porting to the alpha release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants