Skip to content

gix_date::Time cannot represent any timestamp, preventing Signature to roundtrip #1923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pierrechevalier83 opened this issue Apr 4, 2025 · 3 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed

Comments

@pierrechevalier83
Copy link
Contributor

Current behavior 😯

Some timestamps can't be represented by gix_date::Time, which means that we don't roundtrip when parsing them into a Commit.

For instance, in the wild, we see this commit:

git cat-file -p 5e6ecdad9f69b1ff789a17733b8edc6fd7091bd8
tree 0d1971c86d047f9530aee85a50b8fc1ca12723ba
parent 9471b0ab889a4684f87952cb951f7f4dc3f59dda
author Shrikant Sharat Kandula <shrikantsharat.k@gmail.com> 1313584730 +051800
committer Shrikant Sharat Kandula <shrikantsharat.k@gmail.com> 1313584730 +051800
Typo in documentation
The kwarg is named `headers`, not `header`. Also, its a dict, not a set.

Notice that in the author and commiter fields, the timezone offset: +051800 does not match the expected [<sign><HH><MM>] format.

Expected behavior 🤔

I think the time member of Signature should be a BString and the time member of SignatureRef should be a & BStr. We should only parse the timestamp into a gix_date::Time when needed.

This would allow round-tripping.

Git behavior

No response

Steps to reproduce 🕹

I wrote this failing test, to show failure to roundtrip:

With that commit, cargo test from gix-actor outputs:

thread 'signature::round_trip' panicked at gix-actor/tests/signature/mod.rs:80:9:
assertion `left == right` failed
  left: "Sebastian Thiel <byronimo@gmail.com> 1313584730 +0000"
 right: "Sebastian Thiel <byronimo@gmail.com> 1313584730 +051800"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    signature::round_trip
@pierrechevalier83
Copy link
Contributor Author

I can work on it. Making an issue rather than an immediate PR in case there is disagreement on the proposed solution of backing Signature and SignatureRef by the git bytes.

@Byron
Copy link
Member

Byron commented Apr 5, 2025

Thanks for reporting!

I manipulated a real commit to use +051800 and Git can show it, hence it can definitely parse it, which the current code cannot, an issue to fix.

❯ git show
commit 21241da50c3a0c24b95ab99974674e98ff3120ac (HEAD -> gitbutler/workspace)
Author: GitButler <gitbutler@gitbutler.com>
Date:   Sat Apr 26 00:05:50 2025 +51800
[..]

Then I cherry-picked the commit onto another, compatible one to see what it does when reproducing the commit while changing the committer.

❯ gix cat @
tree fbc1dda023bef009898525ec4070b6dd1b938ec4
parent ce75c17e1b6b2599eabec6a7fd0fab6067ab06d6
author GitButler <gitbutler@gitbutler.com> 1743761150 +051800
committer Sebastian Thiel <sebastian.thiel@icloud.com> 1743822295 +0800
[..]

This shows that Git can round-trip such time.

Hence, I'd really appreciate if you'd attempt a fix.

As this affects commits, the code is far less performance-sensitive than #1917. However, I'd still hope that the benchmarks in gix-object can help to avoid unnecessary or unexpectedly large slowdowns.

While micro-benchmarks won't be decisive here, a far better test will be running ein t hours on the linux kernel.

❯ ein t hours
 11:10:10 traverse commit graph done 1.3M commits in 9.55s (137.1K commits/s)
 11:10:10        estimate-hours Extracted and organized data from 1309152 commits in 9.988584ms (131064832 commits/s)
total hours: 1243723.88
total 8h days: 155465.48
total commits = 1309152
total authors: 34641
total unique authors: 26510 (23.47% duplication)

That test will decode commits as fast as possible on one thread and shift additional processing (including parsing them) to another thread. Overall, this line would be the relevant one for performance:

if let Ok(author) = gix::objs::CommitRefIter::from_bytes(&commit_data)
.author()
.map(|author| mailmap.resolve_cow(author.trim()))
{

My thinking is that the performance implications of this change will be negligible, but it's better to make sure of that.

@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed labels Apr 5, 2025
@pierrechevalier83
Copy link
Contributor Author

I'd really appreciate if you'd attempt a fix.

I raised PR 1935 to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants