Skip to content

Comments

update data and decode binary nbt#41

Merged
hayanesuru merged 12 commits intomainfrom
protocol
Feb 18, 2026
Merged

update data and decode binary nbt#41
hayanesuru merged 12 commits intomainfrom
protocol

Conversation

@hayanesuru
Copy link
Owner

@hayanesuru hayanesuru commented Feb 18, 2026

Summary by CodeRabbit

  • Refactor

    • ASCII-first string handling with zero-copy fast paths.
    • NBT parsing rebuilt for depth-bounded assembly; list representation and map-key storage refactored to support compact/heap-backed names.
    • Codegen/data generation moved to static slice literals and deterministic hashing; serialization helpers trimmed.
  • API

    • Protocol messages expanded with richer component/dialog types and clarified intent encoding.
    • Small-string type exposes MAX and new construction/length helpers.
  • Bug Fixes

    • Added overflow-checked bounds for large array parsing.
  • Chore

    • README example simplified.

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Reworks NBT/list/string public types (List→ListTag, Box→Name), implements staged non‑recursive NBT parsing, consolidates MUTF‑8 ASCII detection into a fallible as_mutf8_ascii, migrates build-time codegen to slice-based generation with new hex parsers, and updates protocol/serialization surfaces and tests.

Changes

Cohort / File(s) Summary
NBT core & public API
haya_nbt/src/lib.rs, haya_nbt/src/list.rs, haya_nbt/src/compound.rs, haya_nbt/src/string.rs, haya_nbt/src/stringify.rs
Renamed ListListTag; replaced Box<str> keys with Name (Thin/Heap); updated Tag/List/Compound/ListTag/ListRec/CompoundNamed, From/Write/Read conversions, and stringify/encode paths to use these types.
NBT parsing control flow
haya_nbt/src/lib.rs, haya_nbt/src/list.rs
Removed recursive entry reads; added tag_no_rec/list_no_rec, TagRec/ListRec and read_tag assembler to build nested structures with explicit depth tracking.
String & MUTF‑8 handling
haya_mutf8/src/lib.rs, haya_nbt/src/string.rs
Replaced boolean predicates with pub const fn as_mutf8_ascii(&[u8]) -> Option<&str>; added ASCII fast-paths, changed StringTagRaw to hold &str, and adjusted read/write to use the new API.
Compound key plumbing & HayaStr
haya_nbt/src/compound.rs, haya_ident/src/lib.rs, haya_str/src/lib.rs
Compound keys now Name (Thin/Heap) with read/trait impls; Identifier uses HayaStr::copy_from; made HayaStr::MAX pub, added new() and unsafe set_len().
Build-time codegen & hex parsing
haya_data/build.rs, haya_data/src/lib.rs, haya_data/README.md
Switched from mutable name buffers to slice-based generation and inline consts; block_state now returns (Repr, Vec<u32>, Vec<NonZeroUsize>) propagated into fluid_state; added parse_u32/parse_u64, hex_line, and local hash128; README example trimmed and encode/decode macros removed.
Data library internals & tests
haya_data/src/lib.rs, haya_data/README.md
Removed encode_state/decode_state macros; changed name_u8/name_u16 to accept static slices and *const &'static str; simplified block_state.to_fluid(); updated tests to new APIs.
Protocol changes
haya_protocol/src/lib.rs, haya_protocol/src/clientbound/common.rs, haya_protocol/src/clientbound/login.rs
Marked ClientIntent as #[repr(u8)] with explicit discriminants; added ComponentJson<'a> and Dialog; introduced generic Either<L,R> replacing prior untrusted entry enum; updated affected packet fields.
Serialization helpers & cleanup
haya_ser/src/lib.rs, haya_ser/src/write.rs, haya_ser/src/hex.rs
Removed public parse_hex/Integer trait and hash128; removed Write2/Write3 and iterator Write impls; reduced public re-exports and derived Default for Error.
Safety/overflow fixes
haya_nbt/src/int_array.rs, haya_nbt/src/long_array.rs
Replaced raw len * bytes with len.checked_mul(...).ok_or(Error)? prior to slice splitting to avoid overflow.
Manifests / Cargo
haya_nbt/Cargo.toml, haya_data/Cargo.toml
Added workspace dependency haya_str = { workspace = true }; removed mser from haya_data build-dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Bytes as Raw Bytes
    participant Fast as tag_no_rec / list_no_rec
    participant Stack as TagRec / ListRec
    participant Builder as Assembler
    participant Result as Tag Result

    Bytes->>Fast: read type header
    Fast->>Fast: parse primitive or ASCII fast-path
    alt primitive/simple
        Fast->>Result: return Tag::Primitive
    else list/compound detected
        Fast->>Stack: emit Entry (List/Compound) marker
        loop assemble nested
            Stack->>Fast: request next item
            Fast->>Stack: return primitive or nested Entry
            Stack->>Builder: accumulate names/elements
        end
        Builder->>Result: finalize ListTag / Compound
    end
    Result->>Bytes: completed Tag
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • move mtf8 #39 — similar mutf‑8 API migration and ASCII fallible conversion used across string and NBT code.
  • update layout #36 — overlapping NBT and stringify refactors touching Compound/List/String/Name and encode/deserialize flows.
  • move ident and remove parse int #37 — related identifier/path handling changes (Identifier::new) and HayaStr interactions.

Poem

🐰 I nibbled through bytes in moonlight bright,

Lists found new names and strings took flight,
No recursion tangled in the parsing glade,
Slices, names, and fast paths dance in trade,
A rabbit hops — the code is tidy tonight!

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch protocol

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
haya_str/src/lib.rs (1)

181-212: 🧹 Nitpick | 🔵 Trivial

Add a debug guard in set_len to catch misuse early.

set_len is unsafe, but a cheap debug assertion helps detect violations during development.

Proposed patch
 pub const unsafe fn set_len(&mut self, new_len: usize) {
-    unsafe {
-        self.len = transmute::<u8, Len>(new_len as u8);
-    }
+    debug_assert!(new_len <= MAX);
+    unsafe {
+        self.len = transmute::<u8, Len>(new_len as u8);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_str/src/lib.rs` around lines 181 - 212, Add a cheap debug-time guard to
set_len to catch misuse: inside the const unsafe fn set_len(&mut self, new_len:
usize) add a debug_assert that new_len <= MAX (or equivalent check using MAX and
Len) before performing the transmute and assignment, then keep the existing
unsafe transmute::<u8, Len>(new_len as u8) assignment; reference the set_len
function, the Len type and the MAX constant so the assertion validates bounds in
debug builds while leaving the function unsafe for release.
haya_nbt/src/list.rs (1)

258-356: ⚠️ Potential issue | 🟠 Major

Reject TagType::End lists with non-zero length.

list_no_rec currently returns ListTag::None even when the list length is non-zero, which silently accepts invalid NBT and can desync the parser by leaving bytes unread. Guard on length and return Err(Error) when len != 0.

🔧 Proposed fix
-        TagType::End => Ok(Ok(ListTag::None)),
+        TagType::End if len == 0 => Ok(Ok(ListTag::None)),
+        TagType::End => Err(Error),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 258 - 356, list_no_rec currently treats
TagType::End as Ok(Ok(ListTag::None)) unconditionally; change
ListInfo::list_no_rec to reject TagType::End when the stored length (self.1 as
usize, variable len) is non-zero by returning Err(Error) if len != 0, and only
return Ok(Ok(ListTag::None)) when len == 0 so the parser fails on invalid NBT
instead of silently consuming state.
haya_data/build.rs (2)

278-347: ⚠️ Potential issue | 🟠 Major

Guard against mismatched sizes when building FLUID_STATE_INDEX.
read_rl(size, ..).zip(bl_props.iter()) will silently truncate on mismatch, which can emit a shorter index table and lead to OOB reads in block_state::to_fluid. Add an explicit size check.

Suggested fix
 let (_, size, _) = head(iter.next(), "block_to_fluid_state");
+assert_eq!(size, bl_props.len(), "block_to_fluid_state size mismatch");
 
 let mut out = Vec::<u32>::new();
 for (bound, prop) in read_rl(size, &mut iter).zip(bl_props.iter().copied()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_data/build.rs` around lines 278 - 347, The builder in fluid_state
silently truncates when pairing read_rl(size, &mut iter) with bl_props via zip,
which can produce a shorter FLUID_STATE_INDEX and cause OOB in
block_state::to_fluid; update fluid_state to materialize the read_rl iterator
(e.g., collect into a Vec) and verify its length equals bl_props.len() (or
equals the expected size) before the for-loop, and if mismatched panic or return
an error with a clear message referencing "FLUID_STATE_INDEX" / "fluid_state" /
"block_state::to_fluid" so the mismatch is caught early.

486-1376: ⚠️ Potential issue | 🟠 Major

Add assertion to prevent latent UB when size_kv > 255.

The block::props transmute from &[u8] to &[block_state_property] relies on PROPS being emitted with repr u8. If size_kv exceeds 255, repr_kv becomes u16, causing PROPS to be an array of &[u16] instead of &[u8], creating a size mismatch during transmutation. Currently, size_kv is 120 (within bounds), but there is no guard to prevent future overflow.

Add the assertion immediately after line 525:

let (name_kv, size_kv, repr_kv) = head(iter.next(), "block_state_property");

+assert!(
+    size_kv <= u8::MAX as usize,
+    "block_state_property exceeds u8; update block::props to handle wider repr"
+);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_data/build.rs` around lines 486 - 1376, The emitted PROPS array can end
up with repr_kv != Repr::U8 when size_kv > 255, which makes the later transmute
from &[u8] to &[block_state_property] undefined; after obtaining (name_kv,
size_kv, repr_kv) in block_state (the head call that produces size_kv and
repr_kv), add a runtime assertion that size_kv is <= 255 (or that repr_kv ==
Repr::U8) to fail early and avoid emitting mismatched PROPS; place this assert
immediately after the head(...) that sets name_kv/size_kv/repr_kv so the
generator refuses to produce an unsafe transmute if the table grows too large.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@haya_data/build.rs`:
- Around line 278-347: The builder in fluid_state silently truncates when
pairing read_rl(size, &mut iter) with bl_props via zip, which can produce a
shorter FLUID_STATE_INDEX and cause OOB in block_state::to_fluid; update
fluid_state to materialize the read_rl iterator (e.g., collect into a Vec) and
verify its length equals bl_props.len() (or equals the expected size) before the
for-loop, and if mismatched panic or return an error with a clear message
referencing "FLUID_STATE_INDEX" / "fluid_state" / "block_state::to_fluid" so the
mismatch is caught early.
- Around line 486-1376: The emitted PROPS array can end up with repr_kv !=
Repr::U8 when size_kv > 255, which makes the later transmute from &[u8] to
&[block_state_property] undefined; after obtaining (name_kv, size_kv, repr_kv)
in block_state (the head call that produces size_kv and repr_kv), add a runtime
assertion that size_kv is <= 255 (or that repr_kv == Repr::U8) to fail early and
avoid emitting mismatched PROPS; place this assert immediately after the
head(...) that sets name_kv/size_kv/repr_kv so the generator refuses to produce
an unsafe transmute if the table grows too large.

In `@haya_nbt/src/list.rs`:
- Around line 258-356: list_no_rec currently treats TagType::End as
Ok(Ok(ListTag::None)) unconditionally; change ListInfo::list_no_rec to reject
TagType::End when the stored length (self.1 as usize, variable len) is non-zero
by returning Err(Error) if len != 0, and only return Ok(Ok(ListTag::None)) when
len == 0 so the parser fails on invalid NBT instead of silently consuming state.

In `@haya_str/src/lib.rs`:
- Around line 181-212: Add a cheap debug-time guard to set_len to catch misuse:
inside the const unsafe fn set_len(&mut self, new_len: usize) add a debug_assert
that new_len <= MAX (or equivalent check using MAX and Len) before performing
the transmute and assignment, then keep the existing unsafe transmute::<u8,
Len>(new_len as u8) assignment; reference the set_len function, the Len type and
the MAX constant so the assertion validates bounds in debug builds while leaving
the function unsafe for release.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
haya_data/src/lib.rs (1)

10-27: ⚠️ Potential issue | 🟡 Minor

Add a debug bounds guard before indexing names.

v from VALS is used to index names via raw pointer. If generated tables ever drift (data corruption or hash regen bug), this becomes immediate UB. A debug-only guard keeps release perf unchanged while failing fast in dev.

🛡️ Proposed fix (debug-only guard)
@@
-    let k = unsafe { *names.add(v as usize) };
+    debug_assert!((v as usize) < N);
+    let k = unsafe { *names.add(v as usize) };
@@
-    let k = unsafe { *names.add(v as usize) };
+    debug_assert!((v as usize) < N);
+    let k = unsafe { *names.add(v as usize) };

Also applies to: 30-48

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_data/src/lib.rs` around lines 10 - 27, The code dangerously dereferences
names via raw pointer using v (in name_u8 and the similar functions at lines
30-48) without any bounds check; add a debug-only bounds guard by creating a
slice from the names pointer using the known compile-time length (use the
existing const generic length that corresponds to the names table), e.g. inside
a cfg!(debug_assertions) block call unsafe { std::slice::from_raw_parts(names,
/*names_len const*/ ) } and debug_assert!( (v as usize) < names_slice.len() ),
then in debug use names_slice[v as usize] and in release keep the current unsafe
*names.add(v as usize) to preserve performance; apply the same pattern to
name_u8 and the other similar functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@haya_data/build.rs`:
- Around line 1435-1442: The code that emits the Rust const array in impl_name
writes raw registry names into the generated string (writing "\"" + val + "\""),
which breaks if a name contains " or \; change the emission to produce a valid
Rust string literal by escaping names (e.g., replace the manual quote + val +
quote with something that writes format!("{:?}", val) or uses
val.escape_default() and wraps with quotes) when iterating over names so each
element is properly escaped in the const N array.

In `@haya_data/src/lib.rs`:
- Around line 585-602: There are duplicate hash128 implementations (const fn
hash128) in both the runtime library and build script; extract the
implementation to a single shared source file (e.g., create a standalone
shared_hash128.rs containing the const fn hash128) and have both places consume
that file instead of duplicating logic: in the library replace the inline
function with a mod/include referencing the shared file, and in build.rs use
include!(concat!(env!("CARGO_MANIFEST_DIR"), "/shared_hash128.rs")) (or move the
function into a tiny helper crate that both build.rs and the library depend on)
so both build-time and runtime use the exact same hash128 implementation.

In `@haya_ser/src/lib.rs`:
- Line 10: The public re-export removal of parse_hex broke callers (e.g.,
mser::parse_hex::<u32>(hex) in haya_protocol); either restore parse_hex in this
crate's public API or update callers to the new path. Fix option A: re-export
parse_hex alongside hex_to_u8/u8_to_hex in lib.rs (add pub use
self::hex::parse_hex) so existing call sites compile. Fix option B: change the
call site (e.g., in haya_protocol::chat.rs) to use the new module path where
parse_hex now lives. After choosing and applying one of these fixes, add a
CHANGELOG entry documenting the breaking change and bump the crate version from
2.2.0 to 3.0.0 in Cargo.toml.

In `@haya_str/src/lib.rs`:
- Around line 188-198: The safety comment for HayaStr::set_len is incomplete:
update the doc for the unsafe fn set_len(&mut self, new_len: usize) to require
not only new_len <= MAX but also that bytes in self.data[0..new_len] are
initialized and contain valid UTF‑8 (since HayaStr implements AsRef<str>/Deref
and uses from_utf8_unchecked), so callers cannot create UB by writing arbitrary
bytes then calling set_len; reference HayaStr, set_len, MAX, self.data,
self.len, and the use of from_utf8_unchecked in the text.

---

Outside diff comments:
In `@haya_data/src/lib.rs`:
- Around line 10-27: The code dangerously dereferences names via raw pointer
using v (in name_u8 and the similar functions at lines 30-48) without any bounds
check; add a debug-only bounds guard by creating a slice from the names pointer
using the known compile-time length (use the existing const generic length that
corresponds to the names table), e.g. inside a cfg!(debug_assertions) block call
unsafe { std::slice::from_raw_parts(names, /*names_len const*/ ) } and
debug_assert!( (v as usize) < names_slice.len() ), then in debug use
names_slice[v as usize] and in release keep the current unsafe *names.add(v as
usize) to preserve performance; apply the same pattern to name_u8 and the other
similar functions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)

356-369: ⚠️ Potential issue | 🔴 Critical

Same UB issue: assume_init_mut on uninitialized memory.

Apply the same fix as long_list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 356 - 369, The int_list function uses
v.spare_capacity_mut().assume_init_mut() which invokes UB by treating
uninitialized memory as initialized; mirror the safe pattern used in long_list:
allocate a Vec<i32> with capacity len, obtain a MaybeUninit<i32> slice (or a raw
pointer to the spare capacity), write each i32 into that uninitialized memory
using ptr::write (or MaybeUninit::write) from the big-endian bytes, then
set_len(len) only after all elements have been initialized; update the int_list
function to replace assume_init_mut usage with this
initialized-write-then-set_len approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@haya_data/build.rs`:
- Around line 1840-1857: The unsafe aligned read in the const fn hash128 (at the
loop using u64::from_le_bytes(unsafe { *(n.as_ptr().add(i) as *const [u8; 8])
})) can UB on unaligned pointers; replace that deref with a const-safe unaligned
read via core::ptr::read_unaligned and preserve endianness, e.g. call
core::ptr::read_unaligned(n.as_ptr().add(i) as *const u64) wrapped with
u64::from_le(...). Update the expression inside the first loop in hash128 to use
core::ptr::read_unaligned (fully-qualified or import) and keep the wrapping
.wrapping_mul(M) as before.

In `@haya_nbt/src/compound.rs`:
- Around line 285-294: The Compound::read implementation swallows the actual tag
returned by TagType::Compound.tag(buf) when it is not a Compound; update
Compound::read to return a more specific error that includes the unexpected tag
information rather than a generic Err(Error). Locate the Read::read impl for
Compound and change the branch handling Ok(_) from returning Err(Error) to
constructing and returning an error variant (or error value) that captures the
unexpected Tag or TagType (for example an Error::UnexpectedTag or include the
Tag value/string) so callers can see which tag was encountered; keep the Err(e)
pass-through for parse errors from tag(buf).

In `@haya_nbt/src/list.rs`:
- Around line 371-384: The function short_list uses
v.spare_capacity_mut().assume_init_mut() which reads uninitialized memory and
causes UB; update short_list to initialize elements properly the same way
long_list was fixed: obtain the spare_capacity_mut() buffer, for each index
construct the i16 from the two bytes and write it into the buffer using safe
initialization (e.g., use ptr::write or push into the Vec) and then
set_len(len); keep the same i16::from_be_bytes logic and the function name
short_list to locate the change and mirror the initialization pattern used in
long_list.
- Around line 386-399: The f32_list function is invoking undefined behavior by
calling v.spare_capacity_mut().assume_init_mut() on uninitialized memory; mirror
the long_list fix: allocate uninitialized space safely (e.g., use a temporary
slice of MaybeUninit<f32> or write directly into v's capacity via raw pointers),
copy each 4-byte chunk into an f32 and write it with ptr::write (or assign into
MaybeUninit) rather than assuming init, then set_len(len) once all elements are
initialized; update f32_list to follow the same pattern and helper names used in
long_list to avoid assume_init_mut on uninitialized memory.
- Around line 401-414: The f64_list function currently triggers UB by calling
assume_init_mut on uninitialized Vec memory; mirror the long_list fix by writing
initialized values directly into the Vec's uninitialized buffer instead of using
assume_init_mut. Concretely, allocate with Vec::with_capacity(len), get a raw
pointer via v.as_mut_ptr(), for each index compute the f64 value with
f64::from_be_bytes(..) using slice.as_ptr().add(index*8).cast::<[u8;8]>() and
use ptr.add(index).write(value) to initialize each slot, then call
v.set_len(len); keep function name f64_list and follow the same pointer/write
pattern used in long_list.
- Around line 341-354: The current long_list (and similarly int_list,
short_list, f32_list, f64_list) uses spare_capacity_mut().assume_init_mut() on
uninitialized memory which is UB; instead allocate an uninitialized buffer with
Vec<MaybeUninit<T>> (e.g., MaybeUninit<i64> for long_list), write each element
into that buffer using ptr::write (or the MaybeUninit API) while indexing from
the input slice, then call set_len(len) on the Vec<MaybeUninit<T>> and finally
transmute/collect it into a Vec<T> of initialized values; ensure you replace
calls to assume_init_mut and direct writes into uninitialized memory in
functions long_list, int_list, short_list, f32_list, and f64_list with this safe
MaybeUninit-based initialization pattern.

---

Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 356-369: The int_list function uses
v.spare_capacity_mut().assume_init_mut() which invokes UB by treating
uninitialized memory as initialized; mirror the safe pattern used in long_list:
allocate a Vec<i32> with capacity len, obtain a MaybeUninit<i32> slice (or a raw
pointer to the spare capacity), write each i32 into that uninitialized memory
using ptr::write (or MaybeUninit::write) from the big-endian bytes, then
set_len(len) only after all elements have been initialized; update the int_list
function to replace assume_init_mut usage with this
initialized-write-then-set_len approach.

---

Duplicate comments:
In `@haya_data/build.rs`:
- Around line 1435-1443: The generated const N array writes registry names raw
into Rust string literals, which breaks when a name contains " or \; update the
emission to escape names before appending to *w: for each entry from names
produce a Rust-safe literal (escape backslashes and double quotes and other
control chars, e.g. via name.escape_default() or by replacing "\" with "\\\\"
and "\"" with "\\\"") and then append the escaped string into the "\"...\",\n"
slice emission used for const N so the generated Rust code remains syntactically
valid.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)

240-338: ⚠️ Potential issue | 🟠 Major

Guard against len * N overflow before splitting/decoding.
On 32‑bit targets, len * {2,4,8} can wrap, letting undersized slices pass and feeding unsafe decoders with OOB reads. Use checked_mul (and error on overflow) for all size computations in this function.

✅ Suggested pattern (apply to all byte-size calculations here)
-            TagType::Short => match n.split_at_checked(len * 2) {
+            TagType::Short => {
+                let byte_len = len.checked_mul(2).ok_or(Error)?;
+                match n.split_at_checked(byte_len) {
                 Some((slice, y)) => unsafe {
                     *n = y;
                     Ok(Ok(ListTag::Short(short_list(len, slice))))
                 },
                 None => Err(Error),
-            },
+                }
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 240 - 338, The function
ListInfo::list_no_rec multiplies len by byte sizes (2,4,8) without overflow
checks, which can wrap on 32-bit targets; update every size computation to use
checked_mul (e.g. let bytes = len.checked_mul(4).ok_or(Error)?), return
Err(Error) on overflow, then use bytes for split_at_checked or for the bounds
checks (compare bytes <= n.len()). Apply this pattern for all cases using len *
2/4/8 (TagType::Short, ::Int, ::Long, ::Float, ::Double) and for the manual
loops/length checks (TagType::ByteArray, ::String, ::IntArray, ::LongArray) so
that you compute a checked byte_count before slicing or comparing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 240-338: The function ListInfo::list_no_rec multiplies len by byte
sizes (2,4,8) without overflow checks, which can wrap on 32-bit targets; update
every size computation to use checked_mul (e.g. let bytes =
len.checked_mul(4).ok_or(Error)?), return Err(Error) on overflow, then use bytes
for split_at_checked or for the bounds checks (compare bytes <= n.len()). Apply
this pattern for all cases using len * 2/4/8 (TagType::Short, ::Int, ::Long,
::Float, ::Double) and for the manual loops/length checks (TagType::ByteArray,
::String, ::IntArray, ::LongArray) so that you compute a checked byte_count
before slicing or comparing.

---

Duplicate comments:
In `@haya_str/src/lib.rs`:
- Around line 188-199: Update the safety doc for the const unsafe fn set_len to
avoid the circular/self-slice wording: state that new_len must be <= crate::MAX
and that the first new_len bytes of the underlying backing buffer (not self[..])
must already be initialized and contain valid UTF-8; reference MAX and the
function name set_len in the text so readers can locate it and replace
"self[0..new_len] must be initialized and is valid UTF-8" with a phrase like
"the first new_len bytes of the underlying buffer are initialized and form a
valid UTF‑8 string."

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
haya_nbt/src/list.rs (2)

146-163: ⚠️ Potential issue | 🟡 Minor

Avoid silent usizeu32 truncation in list_info() length fields.

items.len() as _ will truncate for very large vectors, producing an invalid header vs the written payload. Even if “practically impossible” most of the time, this is a sharp edge in a public API.

Consider guarding with debug_assert!(items.len() <= u32::MAX as usize) or using u32::try_from(items.len()).expect(...) (panic is preferable to silently emitting corrupt NBT).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 146 - 163, The ListTag::list_info()
implementation silently casts items.len() with `as _`, risking usize→u32
truncation; replace the direct cast by explicitly converting the length to u32
with a checked conversion (e.g., use u32::try_from(items.len()).expect("ListTag
length exceeds u32::MAX") or at minimum add debug_assert!(items.len() <=
u32::MAX as usize)) so ListInfo receives a safe u32 length; update all match
arms in ListTag::list_info() accordingly to use the checked conversion and keep
the error message descriptive.

341-409: 🧹 Nitpick | 🔵 Trivial

Consider consolidating numeric slice→Vec decoding to reduce duplicated unsafe logic.

long_list, int_list, short_list, f32_list, and f64_list all implement the same pattern: allocate a Vec, iterate through byte chunks, decode big-endian bytes, write to the unsafe pointer, then call set_len. A single generic helper function could consolidate this, similar to the copy_swap pattern used in IntArray (i32) and LongArray (i64). Generalizing such a helper—or creating a wrapper over copy_swap that supports all five element types (i16, i32, i64, f32, f64)—would shrink the unsafe footprint and reduce maintenance burden for these primitives.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 341 - 409, Consolidate the duplicated
unsafe decoding in long_list, int_list, short_list, f32_list, and f64_list by
implementing a single generic helper (e.g., decode_be_chunks<T>) or a wrapper
around the existing copy_swap used by IntArray/LongArray that takes element size
and a conversion-from-bytes function, then replace each specialized function to
call that helper; ensure the helper handles allocation with
Vec::<T>::with_capacity, writes via as_mut_ptr(), calls set_len(len), and uses
the appropriate from_be_bytes conversion for i16/i32/i64/f32/f64 to preserve the
current big-endian decoding behavior.
haya_nbt/src/lib.rs (1)

116-153: 🧹 Nitpick | 🔵 Trivial

Make parse limits explicit (avoid magic 512) and consider adding allocation limits for untrusted input.

  • read_tag(n, t, 512) is a magic constant; a const MAX_DEPTH: usize = 512; (or a configurable parameter) would make this clearer and easier to tune.
  • Depth is bounded, but lengths (u32) still drive potentially large Vec growth (lists, arrays, strings). If this is fed with untrusted network input, consider adding a “max total bytes / max list len” guard at the API boundary (e.g., tag_with_limits(max_depth, max_bytes)), even if the caller usually constrains packet sizes.
Proposed small refactor (constant + defensive depth check)
 impl TagType {
     pub fn tag(self, n: &mut &[u8]) -> Result<Tag, Error> {
+        const MAX_DEPTH: usize = 512;
         let t = match self.tag_no_rec(n)? {
             Ok(x) => return Ok(x),
             Err(TagRec::List) => {
                 let l = ListInfo::read(n)?;
                 match l.list_no_rec(n)? {
                     Ok(x) => return Ok(Tag::List(x)),
                     Err(e) => match e {
                         ListRec::Compound => Entry::ListCompound(Vec::new(), l.1),
                         ListRec::List => Entry::ListList(Vec::new(), l.1),
                     },
                 }
             }
             Err(TagRec::Compound) => Entry::Compound(Compound::new()),
         };
-        read_tag(n, t, 512)
+        read_tag(n, t, MAX_DEPTH)
     }
 }

 fn read_tag(buf: &mut &[u8], mut next: Entry, max_depth: usize) -> Result<Tag, Error> {
     let mut blocks = Vec::<Entry>::new();
     let mut names = Vec::<Name>::new();
     loop {
-        if max_depth == blocks.len() {
+        if blocks.len() >= max_depth {
             return Err(Error);
         }

Also applies to: 155-167, 161-167, 282-286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/lib.rs` around lines 116 - 153, Replace the magic depth constant
in TagType::tag/read_tag with a named constant (e.g., const MAX_DEPTH: usize =
512) and use that constant instead of the literal 512 in the call to read_tag;
also add a new public API variant (e.g., TagType::tag_with_limits or
read_tag_with_limits) that accepts configurable limits like max_depth and
max_bytes so callers can override the default. In addition, add defensive
allocation checks when consuming length-prefixed payloads (ByteArray::read,
StringTag::read, IntArray::read, LongArray::read, ListInfo::read / list_no_rec)
to validate u32 lengths against a provided max_bytes or max_len before
allocating Vecs or Strings, returning Error if limits are exceeded. Ensure
tag_no_rec and TagType::tag use the new limit-aware read_tag API so untrusted
input cannot drive unbounded allocations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@haya_nbt/src/lib.rs`:
- Around line 42-70: Add common derives to the new public types to improve
ergonomics: annotate Compound, Name, and ListInfo with #[derive(Clone, Debug,
PartialEq, Eq, Hash)] and annotate ListTag with #[derive(Clone, Debug,
PartialEq, Eq)] (do not add Hash to ListTag because some variants contain
f32/f64 which do not implement Hash); also ensure the Tag::List(ListTag) usage
compiles after these derives are added.

In `@haya_nbt/src/list.rs`:
- Around line 240-265: The ListInfo::list_no_rec function currently accepts
TagType::End for any length; update it to reject non-zero lengths by checking
the len variable when matching TagType::End and returning Err(Error) if len != 0
(only allow Ok(Ok(ListTag::None)) when len == 0). This change ensures malformed
End-typed lists are rejected and keeps parsing consistent with other parsers.
- Around line 193-198: For Self::ByteArray handling (the match arm using
variable x and inner y), stop writing each element with z.write(w); instead
write the length ((y.len() as u32).write(w)) as before and then emit the entire
i8 slice in one call using the existing byte-slice helper for i8 payloads
(replace the inner y.iter().for_each(|z| z.write(w)) with a single call to the
helper that writes an &[i8] or raw i8 buffer to w). This reduces per-element
overhead; ensure you pass the correct slice type or safely cast/convert to the
helper’s expected &[i8]/[u8] representation.

---

Outside diff comments:
In `@haya_nbt/src/lib.rs`:
- Around line 116-153: Replace the magic depth constant in TagType::tag/read_tag
with a named constant (e.g., const MAX_DEPTH: usize = 512) and use that constant
instead of the literal 512 in the call to read_tag; also add a new public API
variant (e.g., TagType::tag_with_limits or read_tag_with_limits) that accepts
configurable limits like max_depth and max_bytes so callers can override the
default. In addition, add defensive allocation checks when consuming
length-prefixed payloads (ByteArray::read, StringTag::read, IntArray::read,
LongArray::read, ListInfo::read / list_no_rec) to validate u32 lengths against a
provided max_bytes or max_len before allocating Vecs or Strings, returning Error
if limits are exceeded. Ensure tag_no_rec and TagType::tag use the new
limit-aware read_tag API so untrusted input cannot drive unbounded allocations.

In `@haya_nbt/src/list.rs`:
- Around line 146-163: The ListTag::list_info() implementation silently casts
items.len() with `as _`, risking usize→u32 truncation; replace the direct cast
by explicitly converting the length to u32 with a checked conversion (e.g., use
u32::try_from(items.len()).expect("ListTag length exceeds u32::MAX") or at
minimum add debug_assert!(items.len() <= u32::MAX as usize)) so ListInfo
receives a safe u32 length; update all match arms in ListTag::list_info()
accordingly to use the checked conversion and keep the error message
descriptive.
- Around line 341-409: Consolidate the duplicated unsafe decoding in long_list,
int_list, short_list, f32_list, and f64_list by implementing a single generic
helper (e.g., decode_be_chunks<T>) or a wrapper around the existing copy_swap
used by IntArray/LongArray that takes element size and a conversion-from-bytes
function, then replace each specialized function to call that helper; ensure the
helper handles allocation with Vec::<T>::with_capacity, writes via as_mut_ptr(),
calls set_len(len), and uses the appropriate from_be_bytes conversion for
i16/i32/i64/f32/f64 to preserve the current big-endian decoding behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)

146-164: ⚠️ Potential issue | 🟡 Minor

Truncation risk when casting len() to u32.

The items.len() as _ cast silently truncates lengths exceeding u32::MAX on 64-bit systems. While NBT format uses 32-bit lengths, silently truncating could corrupt data.

Consider adding a debug assertion or returning an error for oversized lists.

🛡️ Proposed defensive assertion
 pub fn list_info(&self) -> ListInfo {
     match self {
         Self::None => ListInfo(TagType::End, 0),
-        Self::Byte(items) => ListInfo(TagType::Byte, items.len() as _),
+        Self::Byte(items) => {
+            debug_assert!(items.len() <= u32::MAX as usize);
+            ListInfo(TagType::Byte, items.len() as _)
+        }
         // ... apply similarly to other variants
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@haya_nbt/src/list.rs` around lines 146 - 164, ListTag::list_info currently
casts items.len() to u32 with items.len() as _, which can silently truncate on
64-bit platforms; change it to defensively guard the conversion by checking that
items.len() <= u32::MAX and either (a) use debug_assert!(items.len() <=
u32::MAX) before the cast if you want a non-breaking debug-time check, or (b)
change list_info to return a Result<ListInfo, Error> (or propagate an error) and
return an Err when the length exceeds u32::MAX so oversized lists are handled
safely; update all match arms (the Byte/Short/.../Compound arms) to perform the
same check before producing ListInfo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 146-164: ListTag::list_info currently casts items.len() to u32
with items.len() as _, which can silently truncate on 64-bit platforms; change
it to defensively guard the conversion by checking that items.len() <= u32::MAX
and either (a) use debug_assert!(items.len() <= u32::MAX) before the cast if you
want a non-breaking debug-time check, or (b) change list_info to return a
Result<ListInfo, Error> (or propagate an error) and return an Err when the
length exceeds u32::MAX so oversized lists are handled safely; update all match
arms (the Byte/Short/.../Compound arms) to perform the same check before
producing ListInfo.

---

Duplicate comments:
In `@haya_nbt/src/list.rs`:
- Around line 249-250: The match arm that currently returns
Ok(Ok(ListTag::None)) when encountering TagType::End must validate the list
length and reject non-zero lengths; update the TagType::End branch in the
ListTag parsing code (the match on self.0 that returns ListTag::None) to check
the list length parameter and return an Err (malformed input) if length != 0,
allowing TagType::End only when the length is exactly zero.
- Around line 341-353: The loop in long_list unsafely assigns into uninitialized
memory via *s.add(index) = ..., which should be replaced with ptr.write to
explicitly write into uninitialized slots; update long_list to compute the value
and call s.add(index).write(value) (where s is from v.as_mut_ptr()), then keep
v.set_len(len) afterwards; apply the same change to int_list, short_list,
f32_list, and f64_list so all uninitialized-pointer writes use ptr.write instead
of deref-assignment.

@hayanesuru hayanesuru merged commit c593cf4 into main Feb 18, 2026
2 checks passed
@hayanesuru hayanesuru deleted the protocol branch February 18, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant