Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions tests/test_roundtrip_interpolated.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from src.codex32.codex32 import Codex32String


# secret share from seed
s = Codex32String.from_seed(bytes.fromhex("68f14219957131d21b615271058437e8"), "ms13k00ls")
assert s.s == "ms13k00lsdrc5yxv4wycayxmp2fcstppharks8z0r84pf3uj"

# derive 'a' via proposed BIP-85
a = Codex32String.from_seed(bytes.fromhex("641be1cb12c97ede1c6bad8edf067760"), "ms13k00la")
assert a.s == "ms13k00lavsd7rjcje9ldu8rt4k8d7pnhvppyrt5gpff9wwl"

# derive 'c' via proposed BIP-85
c = Codex32String.from_seed(bytes.fromhex("61b3c4052f7a31dc2b425c843a13c9b4"), "ms13k00lc")
assert c.s == "ms13k00lcvxeugpf00gcac26ztjzr5y7fkjl7fx7nx7ykhkr"

# derive next share via interpolation
d = Codex32String.interpolate_at([s, a, c], "d")
assert d.s == "ms13k00ldp4v5nw8lph96x47mjxzgwjexe44p32swkq99e0w"

# now round-trip d share ('d' is derived via interpolation, NOT via 'from_seed')
dd = Codex32String.from_seed(d.data, "ms13k00ld")
Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. You can only .from_seed without passing pad_val for the k initial strings, derived strings MUST be passed padding to round-trip..

You needed to be able to do this:

dd = Codex32String.from_seed(d.data, "ms13k00ld", d.pad_val)

This version's Codex32String lacks a pad_val property, I'm working on an update which does.

No matter what padding style we use, since it's less than a full 5-bit value, so not in field GF(32), it will not interpolate into derived shares and maintain any linear relationship that allows round-tripping from bytes, GF(256), to GF(32) interpolated strings without passing the padding.

The only string you should care about data of after construction is "s" so the fact other share index values can return data is more of a curiosity and maybe .data should Raise InvalidShareIndex or return None if share_idx != "s" to this misuse.

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to do this which fixes this test case:
dd = Codex32String.from_seed(d.data, "ms13k00ld", pad_val=1)

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

I don't know how to enforce that at the library level, any ideas?

not really... besides grinding correct pad_val right after construction of derived share via round-trips (very meh)

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

So my general idea is that I can use individual shares as normal secrets, load them on HWW, sign with them, etc. For instance user uses one HWW device to do the shamir split, while having N devices ready to export generated/derived shares as QR codes for instance. Load these derived shares on devices and geo-distribute the devices. These then serve as decoy, fully functional signers. When S secret is needed user just collect K devices & does some QR scanning to recover the S on empty HWW.

For this I thought I can use this from_seed/to_seed round-trips. Secure element storage is limited so for me byte encoding is more desired instead of u5.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

You had to grind it because you discarded the pad_val. You might recover a different last data character if you don't know the last character without padding. interpolate_at operates on 5-bit values not bytes.

any ideas?

not really... besides grinding correct pad_val ... (very meh)

It may be possible to do it if you give up being able construct "non-encoded" shares from bytes data and instead accept construction of a Codex32ShareSet object with a from_bytes (or from_seeds) factory. And then use an interpolate_at(share_idx) method of that share set object.

What is your exact use case...?

generated/derived shares as QR codes for instance.

Make sure to skim this compact CodexQR discussion before speccing a QR design, it's the analog of compact SeedQR. I found a fun way to fit 128-bit codex32 share data into 21x21 QR codes by dropping some of the identifier.

Whatever solution we find for Codex32ShareSet.from_bytes(header, dict) would be very helpful there, as well as here.

These then serve as decoy, fully functional signers.

This seems useful!

For this I thought I can use this from_seed/to_seed round-trips.

You may be able to round trip the share set from_seeds/to_seeds or .data of individual shares but we need to define the correct Codex32ShareSet from_seeds class method to make this possible.

The source of truth in a Codex32ShareSet should be the common header and the byte payloads of "s", "a", "c" for k = 3 or maybe "a", "c", "d". CRC padding, which does not interpolate, is slightly more useful on a share you can actually find and verify it on, than trying to interpolate to an unknown share to check if it validates.

Secure element storage is limited so for me byte encoding is more desired instead of u5.

A 21x21 QR has only 137.2 bits if using base45 alphanumeric encoding, 138.2 bits if also using kanji, bytes and numeric modes. So it'd be excellent for us to define a compact encoding of share data. The bare minimum needed to always recover the correct secret and with what's left: prevent user errors.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Yes, this is not their intended purpose but they do contain randomness and I think your idea is a cool and efficient use of that otherwise wasted random data needed for SSS so worth pursuing IF it can be done securely (not revealing any more info about "s" than, at most, its padding bits with k-1 shares.)

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

I agree. The solution to recover seeds from bytes alone is non-trivial but it should exist, lets find it. You'll find this bytes vs 130-bits question tripped up Andrew in the QR discussion, it's always surprising how padding behaves as the finite field changes.

# they are NOT equal after round-trip - seem we miss padding at interpolation level
assert dd.s != d.s # FAIL (should equal)

# irrelevant
# e = Codex32String.interpolate_at([s, a, c], "e")
# assert e.s == "ms13k00lezuknydaaygk5u20zs4fm736vj909mdj6xqp8pc2"
#
# f = Codex32String.interpolate_at([s, a, c], "f")
# assert f.s == "ms13k00lf0ehe53zsu6vrxcjjh9v7wzsa83mqfvku3fd8kem"

# recover from shares, use 'd' without round-trip
rec_s = Codex32String.interpolate_at([a, d, c], "s")
# recover from shares, use 'd' after round-trip
rec_ss = Codex32String.interpolate_at([a, dd, c], "s")

print(" s:", s.data.hex())
print(" rec_s:", rec_s.data.hex())
print("rec_ss:", rec_ss.data.hex())
assert s.data == rec_s.data
assert s.data == rec_ss.data # FAIL
Loading