Conversation
📝 WalkthroughWalkthroughReworks NBT/list/string public types (List→ListTag, Box→Name), implements staged non‑recursive NBT parsing, consolidates MUTF‑8 ASCII detection into a fallible as_mutf8_ascii, migrates build-time codegen to slice-based generation with new hex parsers, and updates protocol/serialization surfaces and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Bytes as Raw Bytes
participant Fast as tag_no_rec / list_no_rec
participant Stack as TagRec / ListRec
participant Builder as Assembler
participant Result as Tag Result
Bytes->>Fast: read type header
Fast->>Fast: parse primitive or ASCII fast-path
alt primitive/simple
Fast->>Result: return Tag::Primitive
else list/compound detected
Fast->>Stack: emit Entry (List/Compound) marker
loop assemble nested
Stack->>Fast: request next item
Fast->>Stack: return primitive or nested Entry
Stack->>Builder: accumulate names/elements
end
Builder->>Result: finalize ListTag / Compound
end
Result->>Bytes: completed Tag
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
haya_str/src/lib.rs (1)
181-212: 🧹 Nitpick | 🔵 TrivialAdd a debug guard in
set_lento catch misuse early.
set_lenis unsafe, but a cheap debug assertion helps detect violations during development.Proposed patch
pub const unsafe fn set_len(&mut self, new_len: usize) { - unsafe { - self.len = transmute::<u8, Len>(new_len as u8); - } + debug_assert!(new_len <= MAX); + unsafe { + self.len = transmute::<u8, Len>(new_len as u8); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_str/src/lib.rs` around lines 181 - 212, Add a cheap debug-time guard to set_len to catch misuse: inside the const unsafe fn set_len(&mut self, new_len: usize) add a debug_assert that new_len <= MAX (or equivalent check using MAX and Len) before performing the transmute and assignment, then keep the existing unsafe transmute::<u8, Len>(new_len as u8) assignment; reference the set_len function, the Len type and the MAX constant so the assertion validates bounds in debug builds while leaving the function unsafe for release.haya_nbt/src/list.rs (1)
258-356:⚠️ Potential issue | 🟠 MajorReject
TagType::Endlists with non-zero length.
list_no_reccurrently returnsListTag::Noneeven when the list length is non-zero, which silently accepts invalid NBT and can desync the parser by leaving bytes unread. Guard on length and returnErr(Error)whenlen != 0.🔧 Proposed fix
- TagType::End => Ok(Ok(ListTag::None)), + TagType::End if len == 0 => Ok(Ok(ListTag::None)), + TagType::End => Err(Error),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 258 - 356, list_no_rec currently treats TagType::End as Ok(Ok(ListTag::None)) unconditionally; change ListInfo::list_no_rec to reject TagType::End when the stored length (self.1 as usize, variable len) is non-zero by returning Err(Error) if len != 0, and only return Ok(Ok(ListTag::None)) when len == 0 so the parser fails on invalid NBT instead of silently consuming state.haya_data/build.rs (2)
278-347:⚠️ Potential issue | 🟠 MajorGuard against mismatched sizes when building
FLUID_STATE_INDEX.
read_rl(size, ..).zip(bl_props.iter())will silently truncate on mismatch, which can emit a shorter index table and lead to OOB reads inblock_state::to_fluid. Add an explicit size check.Suggested fix
let (_, size, _) = head(iter.next(), "block_to_fluid_state"); +assert_eq!(size, bl_props.len(), "block_to_fluid_state size mismatch"); let mut out = Vec::<u32>::new(); for (bound, prop) in read_rl(size, &mut iter).zip(bl_props.iter().copied()) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_data/build.rs` around lines 278 - 347, The builder in fluid_state silently truncates when pairing read_rl(size, &mut iter) with bl_props via zip, which can produce a shorter FLUID_STATE_INDEX and cause OOB in block_state::to_fluid; update fluid_state to materialize the read_rl iterator (e.g., collect into a Vec) and verify its length equals bl_props.len() (or equals the expected size) before the for-loop, and if mismatched panic or return an error with a clear message referencing "FLUID_STATE_INDEX" / "fluid_state" / "block_state::to_fluid" so the mismatch is caught early.
486-1376:⚠️ Potential issue | 🟠 MajorAdd assertion to prevent latent UB when
size_kv > 255.The
block::propstransmute from&[u8]to&[block_state_property]relies onPROPSbeing emitted with repru8. Ifsize_kvexceeds 255,repr_kvbecomesu16, causingPROPSto be an array of&[u16]instead of&[u8], creating a size mismatch during transmutation. Currently,size_kvis 120 (within bounds), but there is no guard to prevent future overflow.Add the assertion immediately after line 525:
let (name_kv, size_kv, repr_kv) = head(iter.next(), "block_state_property"); +assert!( + size_kv <= u8::MAX as usize, + "block_state_property exceeds u8; update block::props to handle wider repr" +);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_data/build.rs` around lines 486 - 1376, The emitted PROPS array can end up with repr_kv != Repr::U8 when size_kv > 255, which makes the later transmute from &[u8] to &[block_state_property] undefined; after obtaining (name_kv, size_kv, repr_kv) in block_state (the head call that produces size_kv and repr_kv), add a runtime assertion that size_kv is <= 255 (or that repr_kv == Repr::U8) to fail early and avoid emitting mismatched PROPS; place this assert immediately after the head(...) that sets name_kv/size_kv/repr_kv so the generator refuses to produce an unsafe transmute if the table grows too large.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@haya_data/build.rs`:
- Around line 278-347: The builder in fluid_state silently truncates when
pairing read_rl(size, &mut iter) with bl_props via zip, which can produce a
shorter FLUID_STATE_INDEX and cause OOB in block_state::to_fluid; update
fluid_state to materialize the read_rl iterator (e.g., collect into a Vec) and
verify its length equals bl_props.len() (or equals the expected size) before the
for-loop, and if mismatched panic or return an error with a clear message
referencing "FLUID_STATE_INDEX" / "fluid_state" / "block_state::to_fluid" so the
mismatch is caught early.
- Around line 486-1376: The emitted PROPS array can end up with repr_kv !=
Repr::U8 when size_kv > 255, which makes the later transmute from &[u8] to
&[block_state_property] undefined; after obtaining (name_kv, size_kv, repr_kv)
in block_state (the head call that produces size_kv and repr_kv), add a runtime
assertion that size_kv is <= 255 (or that repr_kv == Repr::U8) to fail early and
avoid emitting mismatched PROPS; place this assert immediately after the
head(...) that sets name_kv/size_kv/repr_kv so the generator refuses to produce
an unsafe transmute if the table grows too large.
In `@haya_nbt/src/list.rs`:
- Around line 258-356: list_no_rec currently treats TagType::End as
Ok(Ok(ListTag::None)) unconditionally; change ListInfo::list_no_rec to reject
TagType::End when the stored length (self.1 as usize, variable len) is non-zero
by returning Err(Error) if len != 0, and only return Ok(Ok(ListTag::None)) when
len == 0 so the parser fails on invalid NBT instead of silently consuming state.
In `@haya_str/src/lib.rs`:
- Around line 181-212: Add a cheap debug-time guard to set_len to catch misuse:
inside the const unsafe fn set_len(&mut self, new_len: usize) add a debug_assert
that new_len <= MAX (or equivalent check using MAX and Len) before performing
the transmute and assignment, then keep the existing unsafe transmute::<u8,
Len>(new_len as u8) assignment; reference the set_len function, the Len type and
the MAX constant so the assertion validates bounds in debug builds while leaving
the function unsafe for release.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
haya_data/src/lib.rs (1)
10-27:⚠️ Potential issue | 🟡 MinorAdd a debug bounds guard before indexing
names.
vfromVALSis used to indexnamesvia raw pointer. If generated tables ever drift (data corruption or hash regen bug), this becomes immediate UB. A debug-only guard keeps release perf unchanged while failing fast in dev.🛡️ Proposed fix (debug-only guard)
@@ - let k = unsafe { *names.add(v as usize) }; + debug_assert!((v as usize) < N); + let k = unsafe { *names.add(v as usize) }; @@ - let k = unsafe { *names.add(v as usize) }; + debug_assert!((v as usize) < N); + let k = unsafe { *names.add(v as usize) };Also applies to: 30-48
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_data/src/lib.rs` around lines 10 - 27, The code dangerously dereferences names via raw pointer using v (in name_u8 and the similar functions at lines 30-48) without any bounds check; add a debug-only bounds guard by creating a slice from the names pointer using the known compile-time length (use the existing const generic length that corresponds to the names table), e.g. inside a cfg!(debug_assertions) block call unsafe { std::slice::from_raw_parts(names, /*names_len const*/ ) } and debug_assert!( (v as usize) < names_slice.len() ), then in debug use names_slice[v as usize] and in release keep the current unsafe *names.add(v as usize) to preserve performance; apply the same pattern to name_u8 and the other similar functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@haya_data/build.rs`:
- Around line 1435-1442: The code that emits the Rust const array in impl_name
writes raw registry names into the generated string (writing "\"" + val + "\""),
which breaks if a name contains " or \; change the emission to produce a valid
Rust string literal by escaping names (e.g., replace the manual quote + val +
quote with something that writes format!("{:?}", val) or uses
val.escape_default() and wraps with quotes) when iterating over names so each
element is properly escaped in the const N array.
In `@haya_data/src/lib.rs`:
- Around line 585-602: There are duplicate hash128 implementations (const fn
hash128) in both the runtime library and build script; extract the
implementation to a single shared source file (e.g., create a standalone
shared_hash128.rs containing the const fn hash128) and have both places consume
that file instead of duplicating logic: in the library replace the inline
function with a mod/include referencing the shared file, and in build.rs use
include!(concat!(env!("CARGO_MANIFEST_DIR"), "/shared_hash128.rs")) (or move the
function into a tiny helper crate that both build.rs and the library depend on)
so both build-time and runtime use the exact same hash128 implementation.
In `@haya_ser/src/lib.rs`:
- Line 10: The public re-export removal of parse_hex broke callers (e.g.,
mser::parse_hex::<u32>(hex) in haya_protocol); either restore parse_hex in this
crate's public API or update callers to the new path. Fix option A: re-export
parse_hex alongside hex_to_u8/u8_to_hex in lib.rs (add pub use
self::hex::parse_hex) so existing call sites compile. Fix option B: change the
call site (e.g., in haya_protocol::chat.rs) to use the new module path where
parse_hex now lives. After choosing and applying one of these fixes, add a
CHANGELOG entry documenting the breaking change and bump the crate version from
2.2.0 to 3.0.0 in Cargo.toml.
In `@haya_str/src/lib.rs`:
- Around line 188-198: The safety comment for HayaStr::set_len is incomplete:
update the doc for the unsafe fn set_len(&mut self, new_len: usize) to require
not only new_len <= MAX but also that bytes in self.data[0..new_len] are
initialized and contain valid UTF‑8 (since HayaStr implements AsRef<str>/Deref
and uses from_utf8_unchecked), so callers cannot create UB by writing arbitrary
bytes then calling set_len; reference HayaStr, set_len, MAX, self.data,
self.len, and the use of from_utf8_unchecked in the text.
---
Outside diff comments:
In `@haya_data/src/lib.rs`:
- Around line 10-27: The code dangerously dereferences names via raw pointer
using v (in name_u8 and the similar functions at lines 30-48) without any bounds
check; add a debug-only bounds guard by creating a slice from the names pointer
using the known compile-time length (use the existing const generic length that
corresponds to the names table), e.g. inside a cfg!(debug_assertions) block call
unsafe { std::slice::from_raw_parts(names, /*names_len const*/ ) } and
debug_assert!( (v as usize) < names_slice.len() ), then in debug use
names_slice[v as usize] and in release keep the current unsafe *names.add(v as
usize) to preserve performance; apply the same pattern to name_u8 and the other
similar functions.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)
356-369:⚠️ Potential issue | 🔴 CriticalSame UB issue:
assume_init_muton uninitialized memory.Apply the same fix as
long_list.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 356 - 369, The int_list function uses v.spare_capacity_mut().assume_init_mut() which invokes UB by treating uninitialized memory as initialized; mirror the safe pattern used in long_list: allocate a Vec<i32> with capacity len, obtain a MaybeUninit<i32> slice (or a raw pointer to the spare capacity), write each i32 into that uninitialized memory using ptr::write (or MaybeUninit::write) from the big-endian bytes, then set_len(len) only after all elements have been initialized; update the int_list function to replace assume_init_mut usage with this initialized-write-then-set_len approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@haya_data/build.rs`:
- Around line 1840-1857: The unsafe aligned read in the const fn hash128 (at the
loop using u64::from_le_bytes(unsafe { *(n.as_ptr().add(i) as *const [u8; 8])
})) can UB on unaligned pointers; replace that deref with a const-safe unaligned
read via core::ptr::read_unaligned and preserve endianness, e.g. call
core::ptr::read_unaligned(n.as_ptr().add(i) as *const u64) wrapped with
u64::from_le(...). Update the expression inside the first loop in hash128 to use
core::ptr::read_unaligned (fully-qualified or import) and keep the wrapping
.wrapping_mul(M) as before.
In `@haya_nbt/src/compound.rs`:
- Around line 285-294: The Compound::read implementation swallows the actual tag
returned by TagType::Compound.tag(buf) when it is not a Compound; update
Compound::read to return a more specific error that includes the unexpected tag
information rather than a generic Err(Error). Locate the Read::read impl for
Compound and change the branch handling Ok(_) from returning Err(Error) to
constructing and returning an error variant (or error value) that captures the
unexpected Tag or TagType (for example an Error::UnexpectedTag or include the
Tag value/string) so callers can see which tag was encountered; keep the Err(e)
pass-through for parse errors from tag(buf).
In `@haya_nbt/src/list.rs`:
- Around line 371-384: The function short_list uses
v.spare_capacity_mut().assume_init_mut() which reads uninitialized memory and
causes UB; update short_list to initialize elements properly the same way
long_list was fixed: obtain the spare_capacity_mut() buffer, for each index
construct the i16 from the two bytes and write it into the buffer using safe
initialization (e.g., use ptr::write or push into the Vec) and then
set_len(len); keep the same i16::from_be_bytes logic and the function name
short_list to locate the change and mirror the initialization pattern used in
long_list.
- Around line 386-399: The f32_list function is invoking undefined behavior by
calling v.spare_capacity_mut().assume_init_mut() on uninitialized memory; mirror
the long_list fix: allocate uninitialized space safely (e.g., use a temporary
slice of MaybeUninit<f32> or write directly into v's capacity via raw pointers),
copy each 4-byte chunk into an f32 and write it with ptr::write (or assign into
MaybeUninit) rather than assuming init, then set_len(len) once all elements are
initialized; update f32_list to follow the same pattern and helper names used in
long_list to avoid assume_init_mut on uninitialized memory.
- Around line 401-414: The f64_list function currently triggers UB by calling
assume_init_mut on uninitialized Vec memory; mirror the long_list fix by writing
initialized values directly into the Vec's uninitialized buffer instead of using
assume_init_mut. Concretely, allocate with Vec::with_capacity(len), get a raw
pointer via v.as_mut_ptr(), for each index compute the f64 value with
f64::from_be_bytes(..) using slice.as_ptr().add(index*8).cast::<[u8;8]>() and
use ptr.add(index).write(value) to initialize each slot, then call
v.set_len(len); keep function name f64_list and follow the same pointer/write
pattern used in long_list.
- Around line 341-354: The current long_list (and similarly int_list,
short_list, f32_list, f64_list) uses spare_capacity_mut().assume_init_mut() on
uninitialized memory which is UB; instead allocate an uninitialized buffer with
Vec<MaybeUninit<T>> (e.g., MaybeUninit<i64> for long_list), write each element
into that buffer using ptr::write (or the MaybeUninit API) while indexing from
the input slice, then call set_len(len) on the Vec<MaybeUninit<T>> and finally
transmute/collect it into a Vec<T> of initialized values; ensure you replace
calls to assume_init_mut and direct writes into uninitialized memory in
functions long_list, int_list, short_list, f32_list, and f64_list with this safe
MaybeUninit-based initialization pattern.
---
Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 356-369: The int_list function uses
v.spare_capacity_mut().assume_init_mut() which invokes UB by treating
uninitialized memory as initialized; mirror the safe pattern used in long_list:
allocate a Vec<i32> with capacity len, obtain a MaybeUninit<i32> slice (or a raw
pointer to the spare capacity), write each i32 into that uninitialized memory
using ptr::write (or MaybeUninit::write) from the big-endian bytes, then
set_len(len) only after all elements have been initialized; update the int_list
function to replace assume_init_mut usage with this
initialized-write-then-set_len approach.
---
Duplicate comments:
In `@haya_data/build.rs`:
- Around line 1435-1443: The generated const N array writes registry names raw
into Rust string literals, which breaks when a name contains " or \; update the
emission to escape names before appending to *w: for each entry from names
produce a Rust-safe literal (escape backslashes and double quotes and other
control chars, e.g. via name.escape_default() or by replacing "\" with "\\\\"
and "\"" with "\\\"") and then append the escaped string into the "\"...\",\n"
slice emission used for const N so the generated Rust code remains syntactically
valid.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)
240-338:⚠️ Potential issue | 🟠 MajorGuard against
len * Noverflow before splitting/decoding.
On 32‑bit targets,len * {2,4,8}can wrap, letting undersized slices pass and feeding unsafe decoders with OOB reads. Usechecked_mul(and error on overflow) for all size computations in this function.✅ Suggested pattern (apply to all byte-size calculations here)
- TagType::Short => match n.split_at_checked(len * 2) { + TagType::Short => { + let byte_len = len.checked_mul(2).ok_or(Error)?; + match n.split_at_checked(byte_len) { Some((slice, y)) => unsafe { *n = y; Ok(Ok(ListTag::Short(short_list(len, slice)))) }, None => Err(Error), - }, + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 240 - 338, The function ListInfo::list_no_rec multiplies len by byte sizes (2,4,8) without overflow checks, which can wrap on 32-bit targets; update every size computation to use checked_mul (e.g. let bytes = len.checked_mul(4).ok_or(Error)?), return Err(Error) on overflow, then use bytes for split_at_checked or for the bounds checks (compare bytes <= n.len()). Apply this pattern for all cases using len * 2/4/8 (TagType::Short, ::Int, ::Long, ::Float, ::Double) and for the manual loops/length checks (TagType::ByteArray, ::String, ::IntArray, ::LongArray) so that you compute a checked byte_count before slicing or comparing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 240-338: The function ListInfo::list_no_rec multiplies len by byte
sizes (2,4,8) without overflow checks, which can wrap on 32-bit targets; update
every size computation to use checked_mul (e.g. let bytes =
len.checked_mul(4).ok_or(Error)?), return Err(Error) on overflow, then use bytes
for split_at_checked or for the bounds checks (compare bytes <= n.len()). Apply
this pattern for all cases using len * 2/4/8 (TagType::Short, ::Int, ::Long,
::Float, ::Double) and for the manual loops/length checks (TagType::ByteArray,
::String, ::IntArray, ::LongArray) so that you compute a checked byte_count
before slicing or comparing.
---
Duplicate comments:
In `@haya_str/src/lib.rs`:
- Around line 188-199: Update the safety doc for the const unsafe fn set_len to
avoid the circular/self-slice wording: state that new_len must be <= crate::MAX
and that the first new_len bytes of the underlying backing buffer (not self[..])
must already be initialized and contain valid UTF-8; reference MAX and the
function name set_len in the text so readers can locate it and replace
"self[0..new_len] must be initialized and is valid UTF-8" with a phrase like
"the first new_len bytes of the underlying buffer are initialized and form a
valid UTF‑8 string."
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
haya_nbt/src/list.rs (2)
146-163:⚠️ Potential issue | 🟡 MinorAvoid silent
usize→u32truncation inlist_info()length fields.
items.len() as _will truncate for very large vectors, producing an invalid header vs the written payload. Even if “practically impossible” most of the time, this is a sharp edge in a public API.Consider guarding with
debug_assert!(items.len() <= u32::MAX as usize)or usingu32::try_from(items.len()).expect(...)(panic is preferable to silently emitting corrupt NBT).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 146 - 163, The ListTag::list_info() implementation silently casts items.len() with `as _`, risking usize→u32 truncation; replace the direct cast by explicitly converting the length to u32 with a checked conversion (e.g., use u32::try_from(items.len()).expect("ListTag length exceeds u32::MAX") or at minimum add debug_assert!(items.len() <= u32::MAX as usize)) so ListInfo receives a safe u32 length; update all match arms in ListTag::list_info() accordingly to use the checked conversion and keep the error message descriptive.
341-409: 🧹 Nitpick | 🔵 TrivialConsider consolidating numeric slice→Vec decoding to reduce duplicated unsafe logic.
long_list,int_list,short_list,f32_list, andf64_listall implement the same pattern: allocate a Vec, iterate through byte chunks, decode big-endian bytes, write to the unsafe pointer, then callset_len. A single generic helper function could consolidate this, similar to thecopy_swappattern used inIntArray(i32) andLongArray(i64). Generalizing such a helper—or creating a wrapper overcopy_swapthat supports all five element types (i16, i32, i64, f32, f64)—would shrink the unsafe footprint and reduce maintenance burden for these primitives.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 341 - 409, Consolidate the duplicated unsafe decoding in long_list, int_list, short_list, f32_list, and f64_list by implementing a single generic helper (e.g., decode_be_chunks<T>) or a wrapper around the existing copy_swap used by IntArray/LongArray that takes element size and a conversion-from-bytes function, then replace each specialized function to call that helper; ensure the helper handles allocation with Vec::<T>::with_capacity, writes via as_mut_ptr(), calls set_len(len), and uses the appropriate from_be_bytes conversion for i16/i32/i64/f32/f64 to preserve the current big-endian decoding behavior.haya_nbt/src/lib.rs (1)
116-153: 🧹 Nitpick | 🔵 TrivialMake parse limits explicit (avoid magic
512) and consider adding allocation limits for untrusted input.
read_tag(n, t, 512)is a magic constant; aconst MAX_DEPTH: usize = 512;(or a configurable parameter) would make this clearer and easier to tune.- Depth is bounded, but lengths (
u32) still drive potentially largeVecgrowth (lists, arrays, strings). If this is fed with untrusted network input, consider adding a “max total bytes / max list len” guard at the API boundary (e.g.,tag_with_limits(max_depth, max_bytes)), even if the caller usually constrains packet sizes.Proposed small refactor (constant + defensive depth check)
impl TagType { pub fn tag(self, n: &mut &[u8]) -> Result<Tag, Error> { + const MAX_DEPTH: usize = 512; let t = match self.tag_no_rec(n)? { Ok(x) => return Ok(x), Err(TagRec::List) => { let l = ListInfo::read(n)?; match l.list_no_rec(n)? { Ok(x) => return Ok(Tag::List(x)), Err(e) => match e { ListRec::Compound => Entry::ListCompound(Vec::new(), l.1), ListRec::List => Entry::ListList(Vec::new(), l.1), }, } } Err(TagRec::Compound) => Entry::Compound(Compound::new()), }; - read_tag(n, t, 512) + read_tag(n, t, MAX_DEPTH) } } fn read_tag(buf: &mut &[u8], mut next: Entry, max_depth: usize) -> Result<Tag, Error> { let mut blocks = Vec::<Entry>::new(); let mut names = Vec::<Name>::new(); loop { - if max_depth == blocks.len() { + if blocks.len() >= max_depth { return Err(Error); }Also applies to: 155-167, 161-167, 282-286
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/lib.rs` around lines 116 - 153, Replace the magic depth constant in TagType::tag/read_tag with a named constant (e.g., const MAX_DEPTH: usize = 512) and use that constant instead of the literal 512 in the call to read_tag; also add a new public API variant (e.g., TagType::tag_with_limits or read_tag_with_limits) that accepts configurable limits like max_depth and max_bytes so callers can override the default. In addition, add defensive allocation checks when consuming length-prefixed payloads (ByteArray::read, StringTag::read, IntArray::read, LongArray::read, ListInfo::read / list_no_rec) to validate u32 lengths against a provided max_bytes or max_len before allocating Vecs or Strings, returning Error if limits are exceeded. Ensure tag_no_rec and TagType::tag use the new limit-aware read_tag API so untrusted input cannot drive unbounded allocations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@haya_nbt/src/lib.rs`:
- Around line 42-70: Add common derives to the new public types to improve
ergonomics: annotate Compound, Name, and ListInfo with #[derive(Clone, Debug,
PartialEq, Eq, Hash)] and annotate ListTag with #[derive(Clone, Debug,
PartialEq, Eq)] (do not add Hash to ListTag because some variants contain
f32/f64 which do not implement Hash); also ensure the Tag::List(ListTag) usage
compiles after these derives are added.
In `@haya_nbt/src/list.rs`:
- Around line 240-265: The ListInfo::list_no_rec function currently accepts
TagType::End for any length; update it to reject non-zero lengths by checking
the len variable when matching TagType::End and returning Err(Error) if len != 0
(only allow Ok(Ok(ListTag::None)) when len == 0). This change ensures malformed
End-typed lists are rejected and keeps parsing consistent with other parsers.
- Around line 193-198: For Self::ByteArray handling (the match arm using
variable x and inner y), stop writing each element with z.write(w); instead
write the length ((y.len() as u32).write(w)) as before and then emit the entire
i8 slice in one call using the existing byte-slice helper for i8 payloads
(replace the inner y.iter().for_each(|z| z.write(w)) with a single call to the
helper that writes an &[i8] or raw i8 buffer to w). This reduces per-element
overhead; ensure you pass the correct slice type or safely cast/convert to the
helper’s expected &[i8]/[u8] representation.
---
Outside diff comments:
In `@haya_nbt/src/lib.rs`:
- Around line 116-153: Replace the magic depth constant in TagType::tag/read_tag
with a named constant (e.g., const MAX_DEPTH: usize = 512) and use that constant
instead of the literal 512 in the call to read_tag; also add a new public API
variant (e.g., TagType::tag_with_limits or read_tag_with_limits) that accepts
configurable limits like max_depth and max_bytes so callers can override the
default. In addition, add defensive allocation checks when consuming
length-prefixed payloads (ByteArray::read, StringTag::read, IntArray::read,
LongArray::read, ListInfo::read / list_no_rec) to validate u32 lengths against a
provided max_bytes or max_len before allocating Vecs or Strings, returning Error
if limits are exceeded. Ensure tag_no_rec and TagType::tag use the new
limit-aware read_tag API so untrusted input cannot drive unbounded allocations.
In `@haya_nbt/src/list.rs`:
- Around line 146-163: The ListTag::list_info() implementation silently casts
items.len() with `as _`, risking usize→u32 truncation; replace the direct cast
by explicitly converting the length to u32 with a checked conversion (e.g., use
u32::try_from(items.len()).expect("ListTag length exceeds u32::MAX") or at
minimum add debug_assert!(items.len() <= u32::MAX as usize)) so ListInfo
receives a safe u32 length; update all match arms in ListTag::list_info()
accordingly to use the checked conversion and keep the error message
descriptive.
- Around line 341-409: Consolidate the duplicated unsafe decoding in long_list,
int_list, short_list, f32_list, and f64_list by implementing a single generic
helper (e.g., decode_be_chunks<T>) or a wrapper around the existing copy_swap
used by IntArray/LongArray that takes element size and a conversion-from-bytes
function, then replace each specialized function to call that helper; ensure the
helper handles allocation with Vec::<T>::with_capacity, writes via as_mut_ptr(),
calls set_len(len), and uses the appropriate from_be_bytes conversion for
i16/i32/i64/f32/f64 to preserve the current big-endian decoding behavior.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
haya_nbt/src/list.rs (1)
146-164:⚠️ Potential issue | 🟡 MinorTruncation risk when casting
len()tou32.The
items.len() as _cast silently truncates lengths exceedingu32::MAXon 64-bit systems. While NBT format uses 32-bit lengths, silently truncating could corrupt data.Consider adding a debug assertion or returning an error for oversized lists.
🛡️ Proposed defensive assertion
pub fn list_info(&self) -> ListInfo { match self { Self::None => ListInfo(TagType::End, 0), - Self::Byte(items) => ListInfo(TagType::Byte, items.len() as _), + Self::Byte(items) => { + debug_assert!(items.len() <= u32::MAX as usize); + ListInfo(TagType::Byte, items.len() as _) + } // ... apply similarly to other variants🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@haya_nbt/src/list.rs` around lines 146 - 164, ListTag::list_info currently casts items.len() to u32 with items.len() as _, which can silently truncate on 64-bit platforms; change it to defensively guard the conversion by checking that items.len() <= u32::MAX and either (a) use debug_assert!(items.len() <= u32::MAX) before the cast if you want a non-breaking debug-time check, or (b) change list_info to return a Result<ListInfo, Error> (or propagate an error) and return an Err when the length exceeds u32::MAX so oversized lists are handled safely; update all match arms (the Byte/Short/.../Compound arms) to perform the same check before producing ListInfo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@haya_nbt/src/list.rs`:
- Around line 146-164: ListTag::list_info currently casts items.len() to u32
with items.len() as _, which can silently truncate on 64-bit platforms; change
it to defensively guard the conversion by checking that items.len() <= u32::MAX
and either (a) use debug_assert!(items.len() <= u32::MAX) before the cast if you
want a non-breaking debug-time check, or (b) change list_info to return a
Result<ListInfo, Error> (or propagate an error) and return an Err when the
length exceeds u32::MAX so oversized lists are handled safely; update all match
arms (the Byte/Short/.../Compound arms) to perform the same check before
producing ListInfo.
---
Duplicate comments:
In `@haya_nbt/src/list.rs`:
- Around line 249-250: The match arm that currently returns
Ok(Ok(ListTag::None)) when encountering TagType::End must validate the list
length and reject non-zero lengths; update the TagType::End branch in the
ListTag parsing code (the match on self.0 that returns ListTag::None) to check
the list length parameter and return an Err (malformed input) if length != 0,
allowing TagType::End only when the length is exactly zero.
- Around line 341-353: The loop in long_list unsafely assigns into uninitialized
memory via *s.add(index) = ..., which should be replaced with ptr.write to
explicitly write into uninitialized slots; update long_list to compute the value
and call s.add(index).write(value) (where s is from v.as_mut_ptr()), then keep
v.set_len(len) afterwards; apply the same change to int_list, short_list,
f32_list, and f64_list so all uninitialized-pointer writes use ptr.write instead
of deref-assignment.
Summary by CodeRabbit
Refactor
API
Bug Fixes
Chore