-
Notifications
You must be signed in to change notification settings - Fork 11
dslice and dsperse file type integration #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…put via .dsperse/.dslice files.
04c3c84 to
f8eb480
Compare
…ection; Added exclude_patterns to Converter for pk exclusion during packing
HudsonGraeme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Left some small comments, questions and thoughts here 🙏
The main points:
- In the subnet we can't share DSperse / DSlice files back and forth at runtime as it'd be very inefficient / incur a huge amount of data transfer so I think ideally we'd transmit minimal witness inputs and proof data back and forth
- Overall if we could improve pathing logic in the future to have users indicate what kinds of files they've provided and where those are located then fail early if any are missing then it'd save us tons of effort in terms of searching files and path verification.
- Since we're supporting JSTprove (this is probably out of scope for the PR) we should be cognizant of areas where we're hardcoding ezkl related paths
dsperse/src/run/runner.py
Outdated
| slices_dir = self._converted_dir | ||
| model_dir = slices_dir.parent | ||
|
|
||
| # Detect single-slice mode: a directory that itself is a slice (has metadata.json + payload) and no slice_* children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can have the user specify what they're inputting or structure out subcommands in such a way that inherently specifies the incoming file, we could remove all logic that attempts to detect the format or search paths
…enhance metadata handling, and streamline directory operations.
WalkthroughRefactors "segment" → "slice", changes slice payload layout to per-slice payload directories, adds Converter for packaging conversions, introduces Compiler + CompilerUtils and a new Runner + RunnerUtils for compile/run orchestration, updates analyzers/metadata and many CLI commands, and removes legacy monolithic modules. (50 words) Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CLI (convert / slice / compile / run / prove / verify)
participant Conv as Converter
participant RA as RunnerAnalyzer
participant Compiler as Compiler
participant Runner as Runner
participant Prover as Prover
participant EZKL as EZKL Backend
participant ONNX as ONNX Models
User->>CLI: invoke command (convert / slice / compile / run / prove / verify)
CLI->>Conv: detect/convert (dsperse/.dslice ↔ dirs)
Conv-->>CLI: slices_dir
CLI->>RA: generate_run_metadata(slices_dir)
RA-->>CLI: run_metadata
alt compile flow
CLI->>Compiler: compile(slice_path, input, layers)
Compiler->>EZKL: compilation_pipeline (per-slice)
EZKL-->>Compiler: compilation_data
Compiler->>RA: update slice metadata
Compiler-->>CLI: compilation_dir
end
alt run / prove flow
CLI->>Runner: run(run_metadata, input)
Runner->>RA: get_execution_chain(run_metadata)
loop per-slice
Runner->>EZKL: generate_witness / prove
EZKL-->>Runner: witness/proof or error
alt ezkl fails
Runner->>ONNX: run_onnx_slice (fallback)
ONNX-->>Runner: output
end
end
Runner-->>CLI: run_results.json
CLI->>Prover: (optional) prove using run metadata
Prover->>EZKL: verify/prove per-slice proofs
Prover-->>CLI: proof results
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dsperse/src/analyzers/onnx_analyzer.py (1)
337-390: Fixslicesstructure and guardoutput_dirto prevent runtime failures in prover/verifier and persistence flows
generate_slices_metadataintroduces two breaking issues:
slicesmust be a dict, not a list
iter_circuit_slices(utils.py:197–198) callsslices.items(), andprover.py:43/verifier.py:53rely on it to iterate(slice_id, meta)pairs. The current code assigns a list tomodel_overview["slices"], which will crash bothproverandverifierwithAttributeError: 'list' object has no attribute 'items'. Instead, build a dict keyed byslice_id(e.g.,{"slice_0": {...}, "slice_1": {...}}).
output_dircannot beNoneat lines 387–388
The function signature allowsoutput_dir=None, butUtils.save_metadata_file(output_path=output_dir)callsPath(output_path)at line 27, which fails withTypeErroronNone. Similarly,OnnxUtils.write_slice_dirs_metadata(output_dir)callsPath(slices_root)at line 69, also failing onNone. Add a guard: either requireoutput_dir(raise if missing) or default it toos.path.join(os.path.dirname(self.onnx_path), "slices")before persisting.dsperse/src/prover.py (1)
168-177: main: missing early exit when run directory is absentIn the demo
__main__block, ifrun_dirdoes not exist you print a warning but still proceed to:run_dirs = sorted([d for d in os.listdir(run_dir) if d.startswith("run_")])This will raise
FileNotFoundErroronos.listdir(run_dir)instead of cleanly exiting.If this script is used for local testing, it’s better to return/exit early:
run_dir = os.path.join(model_dir, "run") if not os.path.exists(run_dir): - print(f"Run directory not found at {run_dir}, assuming input file provided.") + print(f"Run directory not found at {run_dir}") + exit(1)
♻️ Duplicate comments (6)
dsperse/src/cli/convert.py (2)
64-73: Avoid re-importingPathalias in the prompt loopYou already import
Pathat the top; usingPathdirectly here avoids the extra_Palias and keeps things simpler.- from pathlib import Path as _P while True: chosen_raw = prompt_for_value('to', 'Enter desired output type (dirs, dslice, dsperse)') - chosen = _P(str(chosen_raw)).name.strip().lower() + chosen = Path(str(chosen_raw)).name.strip().lower()[ suggest_optional_refactor ]
75-85: Unify dsperse→dirs handling withConverter.convertand avoid duplicated cleanupThis branch re-implements the
dsperse -> dirspath (including manual cleanup) instead of delegating toConverter.convert, which already owns that behavior. It also means cleanup semantics live in two places.Consider extending
Converter.convertto accept an optionalexpand_slicesflag and delegate from here, so all dsperse→dirs logic and cleanup live in one place:- if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs': - result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False))) - if bool(getattr(args, 'cleanup', True)): - try: - p.unlink() - except Exception: - pass - else: - result = Converter.convert(input_path, output_type=getattr(args, 'to_type', None), output_path=output_path, cleanup=bool(getattr(args, 'cleanup', True))) + if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs': + result = Converter.convert( + str(p), + output_type='dirs', + output_path=output_path, + cleanup=bool(getattr(args, 'cleanup', True)), + # new optional param: + expand_slices=bool(getattr(args, 'expand_slices', False)), + ) + else: + result = Converter.convert( + input_path, + output_type=getattr(args, 'to_type', None), + output_path=output_path, + cleanup=bool(getattr(args, 'cleanup', True)), + )[ suggest_recommended_refactor ]
dsperse/src/cli/verify.py (1)
31-144: Nice fast-path API; consider factoring shared run/metadata logic withrunCLIThe new positional-arg fast path (
run_path/data_path) is a good improvement, and the metadata‑driven fallback fromsource_path/model_pathis clear.There is, however, quite a bit of overlap with
run’s CLI around:
- Path normalization and prompting,
- Resolving model/slices roots,
- Handling output file saving and result summaries.
Given this was already flagged before, it may be worth extracting common helpers in
cli.base(or a small shared module) to avoid divergence betweenrunandverifyover time (e.g., default paths, summary formatting).[ suggest_recommended_refactor ]
dsperse/src/cli/prove.py (2)
169-347: Archive unpacking + run-dir discovery is more complex than needed and duplicates Converter workThis block:
- Unpacks
.dsperse/.dslicejust to discover arundirectory,- Walks the unpacked tree (
os.walk) and manages multiplerun_candidates,- Then later passes the original archive path into
Prover.prove, which will convert/unpack it again viaConverter.Given dsperse bundles do not contain run folders themselves, you can usually infer the run root from the archive’s parent (e.g.,
slices.dsperse→parent/run) and letProver/Converterown archive handling. Consider simplifying to:
- Derive
parent_run = Path(candidate).with_suffix('').parent / "run"(or similar),- Check existence once instead of walking,
- Avoid the extra
Converter.convertcall and unpacked staging dir for run discovery.You can also drop the inner
from dsperse.src.slice.utils.converter import Converter(top-level import already present) and the unusedcleanup_unpackedflag, and rename the unusedfilesloop variable to_filesto keep Ruff happy.
431-463: Repacking proved.dspersebundles may be heavy; consider making it opt-inThis block always repacks an archive into
<stem>_proved.dsperseafter proving, even though:
- The verifier only needs proofs and a vk/settings, not the entire model bundle.
- Converting dirs → dsperse is already handled generically by
Converter, and doing it here adds another large artifact and I/O step.Given earlier feedback about dsperse repacking cost and filename-driven behavior, consider:
- Guarding repack behind a CLI flag (e.g.,
--repack-dsperse),- Or moving bundle creation into a separate explicit command.
That keeps
provefocused on proofs and avoids surprising large bundle writes for users who only care about local verification.dsperse/src/cli/slice.py (1)
42-57:slice convertsubcommand duplicates top-levelconvertbehaviorThe new
slice convertsubparser largely re-implements the logic already present indsperse/src/cli/convert.py(prompting for input, to_type, handling dsperse→dirs, etc.). This works, but increases the risk of the two diverging over time. It might be cleaner to route both entry points through a single shared helper (e.g., callconvert.convert(args)from here) and layer only the extra--exclude-pksemantics on top.
🧹 Nitpick comments (32)
dsperse/src/slice/utils/onnx_utils.py (1)
104-188: Confirm that resetting sliceindexto enumeration order is intendedIn
write_slice_dirs_metadata, you deriveidxfromenumerate(slices)and then:
- Rename directories to
slice_{idx}.- Set
single_slice_entry["index"] = idx.If upstream metadata already carried a meaningful
index(e.g., graph node index or original slice id), this will overwrite it and may break consumers that expect stable IDs.If the intent is to normalize everything to 0‑based sequential slice indices, consider:
- Either documenting this explicitly in a docstring/comment.
- Or preserving the original index in a separate field (e.g.,
original_index) before overwriting:- single_slice_entry = dict(seg) - single_slice_entry["index"] = idx + single_slice_entry = dict(seg) + if "index" in single_slice_entry and single_slice_entry["index"] != idx: + single_slice_entry["original_index"] = single_slice_entry["index"] + single_slice_entry["index"] = idx[ request_verification ]
README.md (2)
14-18: Terminology: “segments” vs “slices”The feature bullets still talk about “segments” and “segment-by-segment” execution, but the rest of the codebase and README have moved to “slice” terminology. For consistency and to reduce confusion for new users, consider updating these remaining references to “slice(s)” where appropriate.
91-96: Minor markdown list indentation nitThe nested bullet list under “The script will:” has extra indentation relative to the
- Unix/macOS:parent, which triggers MD007 (ul-indent). Reducing the child bullets to the expected indentation width would keep markdownlint happy without changing the rendered content.dsperse/src/verifier.py (2)
39-48: Swallowed exceptions when readingexecution_resultsmake debugging harderThe
try/except Exception: passaround buildingproof_paths_by_slicesilently hides any structural issues inrun_results. If the schema ever changes, you’ll just get “Skipping slice: proof path not recorded” with no clue why.Consider at least logging at debug or warning level in the except block (e.g.,
logger.exception("Failed to build proof path map from run_results")) so unexpected formats are visible during troubleshooting.
138-182: Minor:__main__harness f-strings and structureThe test harness at the bottom is helpful, but:
print(f"\nVerification completed!")doesn’t interpolate anything and triggers Ruff F541; it can be a regular string.- Similar for other constant f-strings here.
These are minor, but trimming the unnecessary
fprefixes will keep tooling quiet without changing behavior.dsperse/src/compile/compiler.py (4)
41-56: Slice path resolution and error signaling look correct (minor style nits only)The
_compile_slicelogic correctly prefers an absolutepathand falls back to arelative_pathrooted atbase_path, raisingFileNotFoundErrorwhen neither resolves. That’s the right behavior for missing slice artifacts.If you care about Ruff’s TRY003/F541, you could drop the
fprefixes for constant messages and potentially factor the repeated “No valid path found for slice” string into the exception type, but behavior-wise this is fine.
95-141: Sliced-model compilation flow is coherent; consider slight counter/metadata tweaksThe
_compile_slicesmethod:
- Loads slices metadata via
Utils.find_metadata_path.- Optionally runs
CompilerUtils.run_onnx_inference_chainfor calibration.- Iterates
slices_datawith optionallayer_indicesfiltering.- Calls
_compile_slice, then attaches acompilation.ezklblock per slice and persists the updated metadata.This matches the new EZKL circuitization pipeline and metadata shape used elsewhere. Two small refinements you might consider:
- Only increment
compiled_countwhensuccessis truthy, so the log line reflects successfully compiled slices.- If
Utils.iter_circuit_slices(used byProver) expects a particular structure formetadata["slices"](e.g., mapping keyed by slice_id vs list-of-dicts), double-check that the updated metadata format here stays in sync with that contract to avoid surprises later in the proving pipeline.Otherwise the orchestration looks solid.
143-182: Top-levelcompilerouting and Converter usage make senseThe public
compile()correctly:
- Validates
model_pathexists,- Parses
layersviaCompilerUtils.parse_layers,- Uses
CompilerUtils.is_sliced_modelto branch between sliced and raw.onnxflows,- Converts non-dirs sliced inputs to
dirs, compiles, and converts back to the original type.This provides a clean, single entry point for both sliced and unsliced models. Only minor nits:
- The variable name
typeshadows the builtin; renaming (e.g., toslice_type) would improve readability.- When converting to/from dirs with
cleanup=True, be aware of which artifacts are removed vs preserved; ensure this matches user expectations for dsperse/dslice lifetimes.Behaviorally, the routing looks correct.
184-202: Minor:__main__harness f-string and reuseThe test harness is useful, but:
print(f"Compilation finished.")is a constant and triggers Ruff F541; it can be a plain string.- If you plan to keep this as a developer-only entry point, consider a small helper function (e.g.,
_demo()) and a guard to avoid accidental use in production contexts.Purely cosmetic; no impact on library usage.
dsperse/src/cli/compile.py (3)
74-85: Consider normalizing slice indices when validating layers
available_indicesis taken directly frommetadata['slices'][*]['index']and compared against parsedintindices. If metadata ever stores indices as strings or omits them for some slices, valid layers could be incorrectly treated as missing. You could defensively normalize and filter, e.g., only keep non-Noneindices and cast numeric strings tointbefore comparison.
132-167: Path heuristics are reasonable; minor opportunity to simplify path handlingThe heuristic around
outputdirectories and adjacentmetadata.json/slicespaths is sensible and should reduce user confusion. Right now you maintain bothtarget_path_obj(pre-heuristic) andtp_obj(post-heuristic) and only use the former for the directory-existence gate. You could slightly simplify and avoid potential drift by recreatingPath(target_path)after the heuristics and using that consistently for theis_dircheck andcheck_model_dirgate.
173-177: Remove unnecessary f-string in logger call
logger.info(f"Compiler initialized successfully")doesn’t interpolate anything, which Ruff flags (F541). You can drop thefprefix:- logger.info(f"Compiler initialized successfully") + logger.info("Compiler initialized successfully")dsperse/src/backends/ezkl.py (1)
578-613: Clean up unused f-strings and stale docstring incompilation_pipelineWithin
compilation_pipeline, several f-strings don’t interpolate anything, and the docstring still refers to “circuitization”:settings_path = os.path.join(output_path, f"settings.json") compiled_path = os.path.join(output_path, f"model.compiled") vk_path = os.path.join(output_path, f"vk.key") pk_path = os.path.join(output_path, f"pk.key") ... """Run the full EZKL circuitization pipeline ..."""You can:
- Drop the
fprefixes:- settings_path = os.path.join(output_path, f"settings.json") - compiled_path = os.path.join(output_path, f"model.compiled") - vk_path = os.path.join(output_path, f"vk.key") - pk_path = os.path.join(output_path, f"pk.key") + settings_path = os.path.join(output_path, "settings.json") + compiled_path = os.path.join(output_path, "model.compiled") + vk_path = os.path.join(output_path, "vk.key") + pk_path = os.path.join(output_path, "pk.key")
- Optionally update the docstring and error message around line 667 to say “compilation pipeline” instead of “circuitization” for consistency.
dsperse/src/run/runner.py (2)
22-45: Initialization flow is good; consider more specific exception type and messageThe constructor correctly normalizes
slice_path, generates run metadata when needed, and initializes EZKL. The only nit is:if not slice_path or not Path(slice_path).exists(): raise Exception("A path of a dslice file or dsperse file is required.")Given you also accept a slices directory here, a more accurate and conventional choice would be a
ValueError(or a small customRunnerInitializationError) with wording that includes directories as valid inputs.
123-188: Log EZKL exceptions in_run_ezkl_slicefor easier debuggingCatching all exceptions around
generate_witnesslets you cleanly fall back to ONNX, but right now the exception is only turned into a string inexec_info['error']:try: success, output_tensor = self.ezkl_runner.generate_witness(...) except Exception as e: success = False output_tensor = str(e) ... if not success: exec_info['error'] = output_tensor if isinstance(output_tensor, str) else "Unknown EZKL error"Consider also logging the exception (e.g.,
logger.exception("EZKL generate_witness failed")) so that stack traces are available during debugging, while still returning structured error info for the caller.dsperse/src/slice/utils/converter.py (1)
341-377: Make_zip_directory’s exclude handling work with glob patterns like*pk*Right now
_zip_directorytreats patterns in a bespoke way (e.g.,slice_*skipped viastartswith,*exttreated as “endswith(ext)”), which doesn’t align with the CLI passing'*pk*'to exclude proving key files—'*pk*'will never match under the currentendswith(pattern[1:])logic.Consider switching to standard glob matching using
fnmatch.fnmatchon relative paths and filenames, e.g.:+from fnmatch import fnmatch @@ - with zipfile.ZipFile(output_file, 'w', zipfile.ZIP_DEFLATED) as zipf: - for root, dirs, files in os.walk(source_dir): - root_path = Path(root) - - # Check if we should skip this directory - if exclude_patterns: - rel_root = root_path.relative_to(source_dir) - skip = False - for pattern in exclude_patterns: - if pattern.startswith('*'): - # Match file extension patterns - pass # We'll check files individually - elif str(rel_root).startswith(pattern.replace('*', '')): - skip = True - break - if skip: - continue + with zipfile.ZipFile(output_file, 'w', zipfile.ZIP_DEFLATED) as zipf: + for root, _dirs, files in os.walk(source_dir): + root_path = Path(root) + rel_root = root_path.relative_to(source_dir) + + # Skip entire directories that match any pattern + if exclude_patterns and any(fnmatch(str(rel_root), pat) for pat in exclude_patterns): + continue @@ - # Skip files matching exclude patterns - if exclude_patterns: - skip_file = False - for pattern in exclude_patterns: - if pattern.startswith('*') and file.endswith(pattern[1:]): - skip_file = True - break - if skip_file: - continue + # Skip files matching exclude patterns + if exclude_patterns: + rel_file = rel_root / file + if any(fnmatch(str(rel_file), pat) or fnmatch(file, pat) for pat in exclude_patterns): + continueThis keeps support for
slice_*and also makes*pk*(and similar patterns) behave as expected.dsperse/src/prover.py (3)
26-106: prove_dirs slice orchestration and run_results wiring look consistentThe per-slice proving flow (metadata-driven slice iteration, path resolution via
Utils.resolve_under_slice, witness lookup underrun_path/slice_id/output.json, and proof output viaUtils.proof_output_path) is coherent and lines up with the newUtils.merge_execution_into_run_results/update_metadata_after_executionAPIs. Skipped-slice logging for missing circuit/PK/witness is also sensible.One small improvement: local
provedis never used since you rely onproved_countfrommerge_execution_into_run_results; it can be dropped to avoid confusion.
108-129: dslice/dsperse conversions leave extracted dirs lying around
prove_dsliceandprove_dsperseconvert intodirswithcleanup=False, run proofs, then convert back to the original packaging again withcleanup=False. This keeps both the original archive and the extracted slice directories around, which can accumulate clutter over repeated runs.If the intent is to leave the original packaging intact but not necessarily keep the extracted dirs, consider:
cleanup=Falsefor the initial archive → dirs conversion, so the archive is preserved.cleanup=Truefor the dirs → archive conversion, so temporary slice directories are removed once proofs are generated.That will avoid silently growing temporary directories on repeated invocations.
131-149: prove() dispatcher behavior is sound, but docstring is slightly staleThe
provedispatcher correctly validatesrun_pathviaUtils.load_run_metadataand then routes based onConverter.detect_type(model_dir). That matches the supported packaging types (dslice,dsperse,dirs) and will fail fast on unknown ones.Minor nit: the docstring still mentions "
data_pathpackaging" while the parameter ismodel_dir, which might cause small confusion for CLI users.You could update the docstring summary line to:
"""Route to the appropriate prove path based on `model_dir` packaging."""dsperse/src/run/utils/runner_utils.py (5)
22-59: normalize_for_runtime behavior is reasonable, but broad exception swallowing hampers debuggingThe normalization logic for
packaging_type(dsperse/dslice/ defaultdirs) makes sense and returns(slices_root, original_format, model_path)in a way that lines up with subsequent re-packaging.However, both the
dsperseanddslicebranches swallow all exceptions:except Exception: return (model_path / "slices").resolve(), None, model_pathIf
Converter.convertfails due to a real packaging bug, quietly falling back tomodel_path / "slices"can mask the underlying problem and lead to confusing downstream errors.Consider at least logging the exception before falling back:
- except Exception: - return (model_path / "slices").resolve(), None, model_path + except Exception as e: + logger.warning( + "Failed to normalize %s packaging at %s, falling back to %s/slices: %s", + packaging_type, source_path, model_path, e, + ) + return (model_path / "slices").resolve(), None, model_pathSame pattern can be applied to the
dslicebranch.
95-102: repackage_if_needed: correct behavior but suppressed exceptions hide root cause
repackage_if_neededcorrectly re-emits slices in their original format (dslice/dsperse) viaConverter.convert(..., cleanup=True), but it also catches a bareExceptionand only logs a warning.Given this step can materially affect what remains on disk, it may be worth:
- Restricting the exception to
ValueError/OSErroror- Including more detail in the log (which you already partially do by logging the exception).
If you expect this to be rare and non-fatal, current behavior is acceptable, but please ensure callers are aware that a failed repackage does not raise.
151-167: process_final_output: behavior is fine; tensors in result may need serialization laterThe softmax + argmax post-processing correctly normalizes logits to probabilities and extracts a scalar
predicted_action. Note that storing raw tensors in the returned dict ("logits","probabilities") is fine for in-memory chaining (e.g., intofilter_tensor) but won’t be JSON-serializable as-is if you later write this object to disk.If you ever persist this result, consider converting those tensors to
.tolist()or.cpu().numpy().tolist()at the serialization boundary.
217-250: _is_sliced_model duplicates logic and may be worth centralizing
RunnerUtils._is_sliced_modelreimplements slice detection in a way very similar toConverter.detect_typeandCompilerUtils.is_sliced_model. That’s not wrong, but it does increase the risk of subtle divergence over time as formats evolve.Longer term, consider routing this helper through a single canonical detection function (
Converter.detect_type) and wrapping it with the(is_sliced, slice_path)tuple that callers expect, to keep detection logic in one place.
266-336: check_expected_shape: logic is reasonable, but defaults may be surprisingThe shape validation logic (placeholder replacement, element-count equality, flattened-vs-multidim handling) is sound and will catch many mismatches while returning
Truewhen it can’t reason about the shape.One subtle issue: when
expected_shape_datais a flat list (e.g.,["batch_size", "unknown"]), treatingshape_values = expected_shape_data[0]yields a string and iterates character-by-character. That’s only a fallback path right now, but you may want to standardize on nested lists ([[...]]) at the metadata source to avoid confusing log messages.No change strictly required here; just something to keep in mind when setting defaults upstream.
dsperse/src/utils/utils.py (1)
316-359: update_metadata_after_execution: behavior is reasonable, but catches all ExceptionsThe summary update logic (writing either
proof_summaryorverification_summarywith counts and a timestamp) is appropriate and non-breaking.You do catch a bare
Exceptionboth on read and write:except Exception: logger.warning(...) return ... except Exception as e: logger.warning(...)Given metadata updates are non-critical, this is acceptable, but if debugging failures here ever becomes important, consider narrowing the excepts to
OSError/json.JSONDecodeErroror usinglogger.exceptiononce to retain a traceback.dsperse/src/compile/utils/compiler_utils.py (2)
75-99: _rel_from_payload / _with_slice_prefix duplicate logic from RunnerAnalyzerThese helpers mirror
RunnerAnalyzer.rel_from_payloadandRunnerAnalyzer.with_slice_prefix. That’s not inherently wrong, but maintaining multiple copies of the same path logic increases the chance of subtle drift as payload layouts evolve.If you find yourself tweaking this logic later, consider centralizing it (e.g., under
Utilsor a dedicated path helper module) and relying on that from both compiler and analyzer utilities.
101-125: is_ezkl_compilation_successful and get_relative_paths are consistent, minor doc mismatchThe success check (
compiled,vk_key,pk_key,settingsall exist on disk underpayload) is a solid gate for deciding whether compilation “really” succeeded.
get_relative_pathsdoes the right thing by:
- Converting each artifact path to a
payload-relative path using_rel_from_payload.- Only including calibration when
calibration_inputexists.Docstring says “Returns a tuple of (payload_rel_dict, calibration_rel_path)”, but the function actually returns a single dict with
'calibration'included.You might want to update the docstring to:
"""Compute payload-relative paths for compiled artifacts and the calibration file. Returns: dict[str, str | None]: mapping of keys ('settings', 'compiled', 'vk_key', 'pk_key', 'calibration') to payload-relative paths (or None when unavailable). """dsperse/src/analyzers/runner_analyzer.py (5)
23-40: rel_from_payload / with_slice_prefix: helpers are fine, but duplicated elsewhereThese helpers correctly:
- Extract the
payload/...suffix from a path (or returnNonewhen absent).- Prefix a relative payload path with
slice_#.They’re used consistently within
RunnerAnalyzer. Note that similar helpers exist inCompilerUtils; at some point you may want to centralize them in a shared utility to avoid drift.
42-51: load_slices_metadata: preferlogging.exceptionover re-raisingThe function correctly loads
<slices_dir>/metadata.json, but on failure:except Exception as e: logger.error(f"Failed to load slices metadata from {slices_dir}: {e}") raiseSince you immediately re-raise, using
logger.exception(...)would capture the traceback without extra boilerplate:- except Exception as e: - logger.error(f"Failed to load slices metadata from {slices_dir}: {e}") + except Exception: + logger.exception(f"Failed to load slices metadata from {slices_dir}") raiseNot critical, but improves debuggability.
54-227: process_slices: metadata fusion and circuit eligibility look solid, with a couple of subtle edge casesThe slice processing pipeline is well thought out:
- It merges per-slice metadata (
compilation.ezkl.files,io,deps) with model-level fields onslice_data, with sensible fallbacks.compiled_flagreflects either explicitcompiledflags or presence of a compiled artifact path.circuit_exists,keys_exist,packaged_only, anduse_circuitcorrectly distinguish between unpacked directories and packaged-only artifacts (e.g., dslice/dsperse).- ONNX paths are normalized to
slice_#/payload/..., and EZKL paths are emitted the same way for downstream consumers.- Circuit size is bounded by
RunnerAnalyzer.SIZE_LIMITwhen decidinguse_circuit.Two minor observations:
- When
output_shapefalls back to["batch_size", "unknown"], this shape is not nested, which can lead to slightly odd diagnostics inRunnerUtils.check_expected_shape(it assumes nested lists). It still degrades gracefully, but you may want to update the fallback to something like[["batch_size", "unk__0"]]in the future._resolve_absand_resolve_abs_for_onnxcaptureslice_key/slice_dir_absfrom the loop; since they’re only used immediately, that’s safe despite static-analysis complaints.Functionally, this looks correct.
316-385: _normalize_to_dirs: behavior is mostly reasonable, but docstring and fallbacks could be clearerThe normalization logic supports:
- Model roots with
slices/metadata.json.- Slices dirs with
metadata.json(including dslice-at-root).- Raw
slice_*dirs withmetadata.json+payload.- Single dslice/dsperse files converted to dirs via
Converter.convert.- Directories recognized by
Converter.detect_typeasdirs.A couple of points:
Docstring says it returns
(model_root, slices_dir, original_format), but the function actually returns only(slices_dir, original_format). Updating the docstring would avoid confusion.The final fallback:
return (path_obj.parent / 'slices').resolve(), 'dirs'silently guesses a
slicesdirectory even when nothing matches. Depending on where this is used, you may want to log a warning when falling back this way so callers can tell the input was not recognized explicitly.Overall, though, this is a reasonable normalization strategy.
507-512: Conversion-back on generate_run_metadata: exception swallowing is acceptable but might deserve a logConverting back to the original format:
if original_format != 'dirs': try: Converter.convert(str(slices_dir), output_type=original_format, cleanup=True) except Exception: logger.warning("Failed to convert slices back to original format; continuing.")Given this is a post-processing convenience, swallowing the exception with a warning is reasonable. Just be aware that callers relying on the on-disk packaging being restored should pay attention to this log; you may eventually want to surface this condition more prominently if it’s considered important.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
.gitignore(1 hunks)README.md(9 hunks)docs/arc42.md(3 hunks)docs/overview.md(1 hunks)dsperse/main.py(1 hunks)dsperse/src/analyzers/onnx_analyzer.py(4 hunks)dsperse/src/analyzers/runner_analyzer.py(3 hunks)dsperse/src/backends/ezkl.py(10 hunks)dsperse/src/backends/onnx_models.py(1 hunks)dsperse/src/backends/pth_models.py(1 hunks)dsperse/src/cli/__init__.py(1 hunks)dsperse/src/cli/base.py(3 hunks)dsperse/src/cli/compile.py(6 hunks)dsperse/src/cli/convert.py(1 hunks)dsperse/src/cli/prove.py(8 hunks)dsperse/src/cli/run.py(4 hunks)dsperse/src/cli/slice.py(4 hunks)dsperse/src/cli/verify.py(5 hunks)dsperse/src/compile/compiler.py(1 hunks)dsperse/src/compile/utils/compiler_utils.py(1 hunks)dsperse/src/compiler.py(0 hunks)dsperse/src/prover.py(3 hunks)dsperse/src/run/runner.py(1 hunks)dsperse/src/run/utils/runner_utils.py(1 hunks)dsperse/src/runner.py(0 hunks)dsperse/src/slice/onnx_slicer.py(5 hunks)dsperse/src/slice/slicer.py(3 hunks)dsperse/src/slice/utils/converter.py(1 hunks)dsperse/src/slice/utils/onnx_utils.py(1 hunks)dsperse/src/utils/runner_utils/runner_utils.py(0 hunks)dsperse/src/utils/utils.py(3 hunks)dsperse/src/verifier.py(3 hunks)
💤 Files with no reviewable changes (3)
- dsperse/src/runner.py
- dsperse/src/compiler.py
- dsperse/src/utils/runner_utils/runner_utils.py
🧰 Additional context used
🧬 Code graph analysis (22)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-336)
dsperse/main.py (2)
dsperse/src/cli/slice.py (2)
slice_convert(62-114)slice_model(117-211)dsperse/src/slice/slicer.py (1)
slice_model(58-78)
dsperse/src/compile/utils/compiler_utils.py (3)
dsperse/src/backends/ezkl.py (2)
EZKL(107-693)get_version(137-158)dsperse/src/run/runner.py (2)
run(48-85)run_onnx_slice(88-121)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)detect_type(93-122)
dsperse/src/verifier.py (3)
dsperse/src/backends/ezkl.py (2)
EZKL(107-693)verify(334-394)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)convert(20-90)detect_type(93-122)dsperse/src/utils/utils.py (10)
Utils(12-359)dirs_root_from(183-190)load_run_metadata(174-180)load_run_results(241-247)iter_circuit_slices(193-203)slice_dirs_path(221-222)resolve_under_slice(206-218)merge_execution_into_run_results(257-313)save_run_results(250-254)update_metadata_after_execution(316-359)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
EZKL(107-693)compilation_pipeline(578-671)get_version(137-158)dsperse/src/compile/utils/compiler_utils.py (6)
is_ezkl_compilation_successful(101-109)get_relative_paths(112-125)update_slice_metadata(174-223)run_onnx_inference_chain(227-273)parse_layers(53-70)is_sliced_model(15-50)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)convert(20-90)dsperse/src/utils/utils.py (3)
Utils(12-359)find_metadata_path(45-65)save_metadata_file(18-42)
dsperse/src/backends/pth_models.py (2)
dsperse/src/run/runner.py (1)
run(48-85)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-336)
dsperse/src/run/runner.py (5)
dsperse/src/analyzers/runner_analyzer.py (3)
RunnerAnalyzer(16-513)generate_run_metadata(450-513)get_execution_chain(302-305)dsperse/src/backends/ezkl.py (2)
EZKL(107-693)generate_witness(160-242)dsperse/src/backends/onnx_models.py (2)
OnnxModels(13-222)run_inference(18-60)dsperse/src/run/utils/runner_utils.py (7)
RunnerUtils(16-336)normalize_for_runtime(22-58)make_run_dir(61-68)prepare_slice_io(71-76)execute_slice(79-93)filter_tensor(253-263)repackage_if_needed(96-101)dsperse/src/utils/utils.py (3)
Utils(12-359)read_input(134-171)write_input(127-131)
dsperse/src/slice/utils/onnx_utils.py (1)
dsperse/src/utils/utils.py (1)
save_metadata_file(18-42)
dsperse/src/slice/slicer.py (2)
dsperse/src/slice/onnx_slicer.py (2)
slice(326-457)slice_model(499-519)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)convert(20-90)
dsperse/src/cli/slice.py (4)
dsperse/src/cli/base.py (3)
check_model_dir(157-174)prompt_for_value(242-302)normalize_path(21-32)dsperse/src/slice/slicer.py (2)
Slicer(19-78)slice_model(58-78)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)_dsperse_to_dirs(306-337)convert(20-90)dsperse/src/cli/convert.py (1)
convert(44-91)
dsperse/src/cli/prove.py (4)
dsperse/src/prover.py (2)
Prover(14-149)prove(131-149)dsperse/src/cli/base.py (3)
save_result(218-240)prompt_for_value(242-302)normalize_path(21-32)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)convert(20-90)dsperse/src/utils/utils.py (3)
Utils(12-359)load_run_metadata(174-180)run_results_path(237-238)
dsperse/src/slice/onnx_slicer.py (1)
dsperse/src/utils/utils.py (1)
filter_inputs(69-91)
dsperse/src/run/utils/runner_utils.py (3)
dsperse/src/slice/utils/converter.py (3)
Converter(17-382)convert(20-90)detect_type(93-122)dsperse/src/utils/torch_utils.py (2)
ModelUtils(20-538)load_metadata(528-538)dsperse/src/run/runner.py (2)
_run_ezkl_slice(123-188)run_onnx_slice(88-121)
dsperse/src/cli/verify.py (4)
dsperse/src/verifier.py (2)
Verifier(15-135)verify(123-135)dsperse/src/cli/base.py (3)
save_result(218-240)prompt_for_value(242-302)normalize_path(21-32)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)_dsperse_to_dirs(306-337)convert(20-90)dsperse/src/utils/utils.py (3)
Utils(12-359)load_run_metadata(174-180)run_results_path(237-238)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (2)
Utils(12-359)save_metadata_file(18-42)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)detect_type(93-122)convert(20-90)
dsperse/src/cli/convert.py (2)
dsperse/src/cli/base.py (2)
normalize_path(21-32)prompt_for_value(242-302)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)convert(20-90)_dsperse_to_dirs(306-337)
dsperse/src/slice/utils/converter.py (1)
dsperse/src/cli/convert.py (1)
convert(44-91)
dsperse/src/prover.py (3)
dsperse/src/backends/ezkl.py (1)
prove(244-332)dsperse/src/slice/utils/converter.py (2)
convert(20-90)detect_type(93-122)dsperse/src/utils/utils.py (12)
Utils(12-359)dirs_root_from(183-190)load_run_metadata(174-180)iter_circuit_slices(193-203)slice_dirs_path(221-222)resolve_under_slice(206-218)witness_path_for(225-226)proof_output_path(229-234)load_run_results(241-247)merge_execution_into_run_results(257-313)save_run_results(250-254)update_metadata_after_execution(316-359)
dsperse/src/analyzers/onnx_analyzer.py (3)
dsperse/src/slice/onnx_slicer.py (1)
slice(326-457)dsperse/src/slice/utils/onnx_utils.py (2)
OnnxUtils(14-214)write_slice_dirs_metadata(67-214)dsperse/src/utils/utils.py (2)
Utils(12-359)save_metadata_file(18-42)
dsperse/src/cli/run.py (2)
dsperse/src/run/runner.py (2)
run(48-85)Runner(22-247)dsperse/src/cli/base.py (3)
prompt_for_value(242-302)normalize_path(21-32)check_model_dir(157-174)
dsperse/src/cli/compile.py (2)
dsperse/src/compile/compiler.py (2)
compile(143-181)Compiler(23-181)dsperse/src/cli/base.py (3)
check_model_dir(157-174)prompt_for_value(242-302)normalize_path(21-32)
dsperse/src/backends/ezkl.py (3)
dsperse/src/run/runner.py (1)
run(48-85)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-336)dsperse/src/compile/compiler.py (1)
compile(143-181)
🪛 markdownlint-cli2 (0.18.1)
README.md
92-92: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🪛 Ruff (0.14.5)
dsperse/src/compile/utils/compiler_utils.py
258-258: f-string without any placeholders
Remove extraneous f prefix
(F541)
260-260: Unpacked variable tensor is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
dsperse/src/verifier.py
47-48: try-except-pass detected, consider logging the exception
(S110)
47-47: Do not catch blind exception: Exception
(BLE001)
135-135: Avoid specifying long messages outside the exception class
(TRY003)
171-171: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/compile/compiler.py
52-52: Avoid specifying long messages outside the exception class
(TRY003)
54-54: f-string without any placeholders
Remove extraneous f prefix
(F541)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
55-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
62-62: f-string without any placeholders
Remove extraneous f prefix
(F541)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Avoid specifying long messages outside the exception class
(TRY003)
181-181: Avoid specifying long messages outside the exception class
(TRY003)
202-202: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/run/runner.py
23-23: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
23-23: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
27-27: Create your own exception
(TRY002)
27-27: Avoid specifying long messages outside the exception class
(TRY003)
48-48: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
69-69: Create your own exception
(TRY002)
69-69: Avoid specifying long messages outside the exception class
(TRY003)
88-88: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
123-123: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
136-136: Do not catch blind exception: Exception
(BLE001)
168-168: Do not catch blind exception: Exception
(BLE001)
213-213: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
dsperse/src/slice/utils/onnx_utils.py
47-48: try-except-pass detected, consider logging the exception
(S110)
47-47: Do not catch blind exception: Exception
(BLE001)
57-58: try-except-pass detected, consider logging the exception
(S110)
57-57: Do not catch blind exception: Exception
(BLE001)
59-60: try-except-continue detected, consider logging the exception
(S112)
59-59: Do not catch blind exception: Exception
(BLE001)
61-62: try-except-pass detected, consider logging the exception
(S110)
61-61: Do not catch blind exception: Exception
(BLE001)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Do not catch blind exception: Exception
(BLE001)
129-129: Do not catch blind exception: Exception
(BLE001)
151-151: Do not catch blind exception: Exception
(BLE001)
203-203: Do not catch blind exception: Exception
(BLE001)
dsperse/src/slice/slicer.py
70-70: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/cli/slice.py
99-100: try-except-pass detected, consider logging the exception
(S110)
99-99: Do not catch blind exception: Exception
(BLE001)
112-112: Do not catch blind exception: Exception
(BLE001)
dsperse/src/cli/prove.py
106-106: Do not catch blind exception: Exception
(BLE001)
127-127: Do not catch blind exception: Exception
(BLE001)
138-138: Do not catch blind exception: Exception
(BLE001)
162-162: Do not catch blind exception: Exception
(BLE001)
279-279: Loop control variable files not used within loop body
Rename unused files to _files
(B007)
342-342: Do not catch blind exception: Exception
(BLE001)
406-406: Do not catch blind exception: Exception
(BLE001)
461-461: Do not catch blind exception: Exception
(BLE001)
dsperse/src/slice/onnx_slicer.py
420-420: Do not catch blind exception: Exception
(BLE001)
426-426: Do not catch blind exception: Exception
(BLE001)
dsperse/src/run/utils/runner_utils.py
36-36: Do not catch blind exception: Exception
(BLE001)
50-50: Consider moving this statement to an else block
(TRY300)
51-51: Do not catch blind exception: Exception
(BLE001)
100-100: Do not catch blind exception: Exception
(BLE001)
109-109: Unused static method argument: model_directory
(ARG004)
109-109: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
109-109: Unused static method argument: save_reshape
(ARG004)
140-140: Prefer TypeError exception for invalid type
(TRY004)
140-140: Avoid specifying long messages outside the exception class
(TRY003)
256-256: Local variable probabilities is assigned to but never used
Remove assignment to unused variable probabilities
(F841)
257-257: Local variable predicted_action is assigned to but never used
Remove assignment to unused variable predicted_action
(F841)
dsperse/src/cli/verify.py
103-103: Do not catch blind exception: Exception
(BLE001)
123-123: Do not catch blind exception: Exception
(BLE001)
134-134: Do not catch blind exception: Exception
(BLE001)
157-157: Do not catch blind exception: Exception
(BLE001)
233-233: Loop control variable files not used within loop body
Rename unused files to _files
(B007)
299-299: Local variable cleanup_unpacked is assigned to but never used
Remove assignment to unused variable cleanup_unpacked
(F841)
301-301: Do not catch blind exception: Exception
(BLE001)
364-364: Do not catch blind exception: Exception
(BLE001)
413-413: Do not catch blind exception: Exception
(BLE001)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
80-80: Do not catch blind exception: Exception
(BLE001)
89-89: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
134-134: Function definition does not bind loop variable slice_dir_abs
(B023)
138-138: Function definition does not bind loop variable slice_key
(B023)
141-141: Function definition does not bind loop variable slice_dir_abs
(B023)
163-163: Do not catch blind exception: Exception
(BLE001)
182-182: Function definition does not bind loop variable slice_dir_abs
(B023)
185-185: Function definition does not bind loop variable slice_key
(B023)
188-188: Function definition does not bind loop variable slice_dir_abs
(B023)
341-341: Do not catch blind exception: Exception
(BLE001)
350-351: try-except-pass detected, consider logging the exception
(S110)
350-350: Do not catch blind exception: Exception
(BLE001)
361-361: Do not catch blind exception: Exception
(BLE001)
376-376: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
376-376: Useless if-else condition
(RUF034)
398-399: try-except-pass detected, consider logging the exception
(S110)
398-398: Do not catch blind exception: Exception
(BLE001)
410-410: Do not catch blind exception: Exception
(BLE001)
420-421: try-except-continue detected, consider logging the exception
(S112)
420-420: Do not catch blind exception: Exception
(BLE001)
430-431: try-except-pass detected, consider logging the exception
(S110)
430-430: Do not catch blind exception: Exception
(BLE001)
475-475: Do not catch blind exception: Exception
(BLE001)
493-494: try-except-pass detected, consider logging the exception
(S110)
493-493: Do not catch blind exception: Exception
(BLE001)
510-510: Do not catch blind exception: Exception
(BLE001)
dsperse/src/cli/convert.py
82-83: try-except-pass detected, consider logging the exception
(S110)
82-82: Do not catch blind exception: Exception
(BLE001)
89-89: Do not catch blind exception: Exception
(BLE001)
dsperse/src/utils/utils.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
178-178: Avoid specifying long messages outside the exception class
(TRY003)
294-294: Avoid specifying long messages outside the exception class
(TRY003)
329-329: Do not catch blind exception: Exception
(BLE001)
349-349: Avoid specifying long messages outside the exception class
(TRY003)
358-358: Do not catch blind exception: Exception
(BLE001)
dsperse/src/slice/utils/converter.py
20-20: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
35-35: Avoid specifying long messages outside the exception class
(TRY003)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
76-76: Do not catch blind exception: Exception
(BLE001)
88-88: Do not catch blind exception: Exception
(BLE001)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
120-120: Avoid specifying long messages outside the exception class
(TRY003)
122-122: Avoid specifying long messages outside the exception class
(TRY003)
135-135: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
165-165: Do not catch blind exception: Exception
(BLE001)
170-170: Avoid specifying long messages outside the exception class
(TRY003)
195-195: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
264-264: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
287-287: Avoid specifying long messages outside the exception class
(TRY003)
306-306: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
313-313: Avoid specifying long messages outside the exception class
(TRY003)
341-341: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
346-346: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
437-437: Do not catch blind exception: Exception
(BLE001)
dsperse/src/prover.py
149-149: Avoid specifying long messages outside the exception class
(TRY003)
190-190: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/cli/compile.py
176-176: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/backends/ezkl.py
145-145: subprocess call: check for execution of untrusted input
(S603)
155-155: Do not catch blind exception: Exception
(BLE001)
601-601: f-string without any placeholders
Remove extraneous f prefix
(F541)
602-602: f-string without any placeholders
Remove extraneous f prefix
(F541)
603-603: f-string without any placeholders
Remove extraneous f prefix
(F541)
604-604: f-string without any placeholders
Remove extraneous f prefix
(F541)
…imizing imports, and removing deprecated or unused options for improved clarity and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (15)
dsperse/src/cli/prove.py (2)
97-147: Fast-path implementation duplicates metadata/data_path logic with legacy flow.Lines 103-118 duplicate the metadata loading and
data_pathinference logic that also exists in lines 287-301. Consider extracting a helper like_resolve_data_path(run_path, explicit_data_path, metadata)to reduce duplication and ensure consistency.def _resolve_data_path(run_path: str, explicit_data_path: str | None, metadata: dict) -> str | None: """Derive data_path from explicit arg, metadata source_path, or model_path/slices.""" if explicit_data_path: return normalize_path(explicit_data_path) source = metadata.get('source_path') if source: return normalize_path(source) model_path = metadata.get('model_path') if model_path: return normalize_path(str(Path(model_path) / 'slices')) return None
346-347: Remove emptyfinallyblock.The empty
finally: passblock serves no purpose and adds noise. Remove it unless you plan to add cleanup logic.except Exception as e: print(f"{Fore.RED}Error proving run: {e}{Style.RESET_ALL}") traceback.print_exc() - finally: - passdsperse/src/run/utils/runner_utils.py (2)
108-150: Unused parameters and implicitOptionaltype hint.The
model_directoryandsave_reshapeparameters are unused (intended for deferred reshape functionality per commented code). Update the type hint to use explicitOptional[str]for clarity:+from typing import Optional + @staticmethod -def preprocess_input(input_path:str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor: +def preprocess_input( + input_path: str, + model_directory: Optional[str] = None, + save_reshape: bool = False, +) -> torch.Tensor: """ Preprocess input data from JSON. + + Args: + input_path: Path to input JSON file + model_directory: (Reserved for future reshape feature) + save_reshape: (Reserved for future reshape feature) """Note:
Optionalis already imported at line 5, but the type hint at line 109 still uses the implicit form.
36-37: Broad exception catches silently swallow errors.Multiple places catch
Exceptionand either pass or return fallback values without logging. This can mask real issues:
- Lines 36-37:
normalize_for_runtimedsperse conversion failure- Lines 51-52: dslice conversion failure
- Lines 100-101:
repackage_if_neededlogs but other places don'tConsider logging at least a debug/warning message for silent fallbacks:
if packaging_type == "dsperse": try: converted = Converter.convert(source_path, output_type="dirs", cleanup=False) return Path(converted), "dsperse", model_path - except Exception: + except Exception as e: + logger.debug(f"Failed to convert dsperse source, falling back to dirs: {e}") return (model_path / "slices").resolve(), None, model_pathAlso applies to: 51-52, 100-101
dsperse/src/cli/convert.py (1)
76-82: Calling private method_dsperse_to_dirsdirectly and silently swallowing unlink errors.Two issues here:
Converter._dsperse_to_dirsis a private method (underscore prefix). Ifexpand_slicesis needed at the public API level, consider exposing it throughConverter.convert()as a parameter.- The
try: p.unlink() except Exception: passsilently ignores errors—permission issues or file-in-use scenarios will go unnoticed.if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs': - result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False))) + # TODO: Consider adding expand_slices param to Converter.convert() public API + result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False))) if bool(getattr(args, 'cleanup', True)): try: p.unlink() - except Exception: - pass + except Exception as e: + logger.warning(f"Could not remove source file {p}: {e}")dsperse/src/cli/verify.py (2)
294-295: Remove emptyfinallyblock.Same as in
prove.py—serves no purpose.except Exception as e: print(f"{Fore.RED}Error verifying run: {e}{Style.RESET_ALL}") traceback.print_exc() - finally: - pass
94-143: Fast-path mirrors prove.py's structure—consider shared extraction.The fast-path logic (lines 94-143) closely mirrors
prove.pylines 97-147: load metadata, resolve data_path, invoke backend, save results, print summary. While the past review comment noted this was deferred, extracting common patterns tobase.pywould reduce maintenance burden as CLI commands evolve.README.md (1)
92-92: Minor markdown list indentation.Static analysis flagged unordered list indentation at line 92 (expected 2 spaces, found 4). This is a minor formatting nit that won't affect rendering in most Markdown parsers.
- The script will: - - Install the DSperse CLI in editable mode so the dsperse command is available + - Install the DSperse CLI in editable mode so the dsperse command is availabledsperse/src/verifier.py (1)
41-48: Silent exception swallows potential data issues.The
try-except-passwhen buildingproof_paths_by_slicecould hide malformedrun_results.jsondata. Consider logging at debug level:try: for entry in run_results.get("execution_chain", {}).get("execution_results", []): sid = entry.get("slice_id") or entry.get("segment_id") pe = entry.get("proof_execution", {}) if isinstance(entry, dict) else {} if sid and pe and pe.get("proof_file"): proof_paths_by_slice[sid] = pe.get("proof_file") - except Exception: - pass + except Exception as e: + logger.debug(f"Could not parse proof paths from run_results: {e}")dsperse/src/cli/slice.py (2)
90-101: Consider logging cleanup failures explicitly.The cleanup logic silently swallows exceptions when deleting the source file. While this may be intentional to prevent cleanup failures from blocking the conversion, it would be helpful to log when cleanup fails so users are aware the source file remains.
Apply this diff to add logging:
if bool(getattr(args, 'cleanup', True)): try: p.unlink() - except Exception: - pass + except Exception as e: + logger.warning(f"Failed to cleanup source file {p}: {e}")
92-93: Direct call to private method may break encapsulation.The code directly calls
Converter._dsperse_to_dirs, which is a private method (indicated by the leading underscore). This tight coupling could break if the Converter's internal implementation changes. Consider whether this special case should be part of Converter's public API instead.If this pattern is used elsewhere or if
_dsperse_to_dirshas specific behavior not available through the publicconvertmethod, please confirm this is intentional. Otherwise, consider routing through the public API:result = Converter.convert( str(p), output_type='dirs', output_path=output_path, cleanup=False # Handle cleanup separately after )dsperse/src/run/runner.py (1)
95-106: Extract duplicated path resolution logic.The
_resolve_rel_pathhelper function insiderun_onnx_sliceis duplicated in_run_ezkl_slice(lines 128-144). This duplication makes maintenance harder and increases the risk of inconsistent behavior.Consider extracting this to a shared helper method:
+ @staticmethod + def _resolve_slice_path(path: str, base_dir: Path) -> str: + """Resolve a path that may be relative to base_dir, handling slice_# prefixes.""" + if not path: + return None + p_str = str(path) + if os.path.isabs(p_str): + return p_str + try: + sd_name = os.path.basename(os.path.abspath(str(base_dir))) if base_dir else None + except Exception: + sd_name = None + parts = p_str.split(os.sep) + if sd_name and parts and parts[0] == sd_name: + parts = parts[1:] + p_str = os.path.join(*parts) if parts else '' + if base_dir: + return str((Path(base_dir) / p_str).resolve()) + return os.path.abspath(p_str) + @staticmethod def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Path = None): """Run ONNX inference for a slice.""" onnx_path = slice_info.get("path") - - def _resolve_rel_path(p: str, base_dir: Path) -> str: - ... - if onnx_path and not os.path.isabs(str(onnx_path)): - onnx_path = _resolve_rel_path(onnx_path, slice_dir) + onnx_path = Runner._resolve_slice_path(onnx_path, slice_dir)Then use the same helper in
_run_ezkl_sliceon lines 151-156.dsperse/src/compile/compiler.py (1)
74-79: Consolidate duplicate metadata update logic.Lines 74-79 have duplicated logic for updating slice metadata, differing only in how the path is constructed. This can be simplified to improve maintainability.
Apply this diff:
- if slice_data.get('slice_metadata') and os.path.exists(slice_data.get('slice_metadata')): - path = Path(slice_data.get('slice_metadata')) - CompilerUtils.update_slice_metadata(path, success, file_paths) - elif slice_data.get('slice_metadata_relative_path') and os.path.exists(os.path.join(base_path, slice_data.get('slice_metadata_relative_path'))): - path = Path(os.path.join(base_path, slice_data.get('slice_metadata_relative_path'))) - CompilerUtils.update_slice_metadata(path, success, file_paths) + metadata_path = slice_data.get('slice_metadata') + if not metadata_path and slice_data.get('slice_metadata_relative_path'): + metadata_path = os.path.join(base_path, slice_data.get('slice_metadata_relative_path')) + + if metadata_path and os.path.exists(metadata_path): + CompilerUtils.update_slice_metadata(Path(metadata_path), success, file_paths)dsperse/src/analyzers/runner_analyzer.py (2)
128-141: Consolidate duplicate path resolution functions.The method defines two very similar local functions for resolving absolute paths:
_resolve_abs(lines 128-141) for EZKL artifacts_resolve_abs_for_onnx(lines 176-188) for ONNX pathsBoth handle slice_dir_abs resolution and slice_key prefix stripping. Additionally, static analysis warns that these functions capture loop variables (
slice_dir_abs,slice_key), which can lead to unexpected behavior.Consider extracting a single static or class method that takes these values as explicit parameters:
@staticmethod def _resolve_to_absolute(path: str, slice_dir: Path, slice_key: str) -> Optional[str]: """Resolve a possibly-relative path to absolute, handling slice_key prefixes.""" if not path: return None p_str = str(path) if os.path.isabs(p_str): return p_str if slice_dir is None: return p_str # Strip slice_key prefix if present parts = p_str.split(os.sep) if parts and parts[0] == slice_key: parts = parts[1:] p_str = os.path.join(*parts) if parts else '' return str((slice_dir / p_str).resolve())Then use this method for both EZKL and ONNX path resolution, which also eliminates the B023 warnings about loop variable binding.
Also applies to: 176-188
315-384: Consider refactoring complex normalization logic.The
_normalize_to_dirsmethod handles many different input cases but has grown quite complex with nested conditions and multiple try-except blocks. This makes it harder to understand and maintain.Consider breaking this into smaller helper methods:
_handle_directory_input(path: Path) -> tuple[Path, str]_handle_file_input(path: Path) -> tuple[Path, str]_handle_model_root(path: Path) -> tuple[Path, str]Also, line 376 assigns
model_rootbut never uses it. This should be removed:- # If this looks like a slices folder (has metadata.json), parent is model root - model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent return sdir.resolve(), 'dirs'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
README.md(9 hunks)dsperse/src/analyzers/runner_analyzer.py(3 hunks)dsperse/src/cli/convert.py(1 hunks)dsperse/src/cli/prove.py(9 hunks)dsperse/src/cli/slice.py(4 hunks)dsperse/src/cli/verify.py(7 hunks)dsperse/src/compile/compiler.py(1 hunks)dsperse/src/compile/utils/compiler_utils.py(1 hunks)dsperse/src/run/runner.py(1 hunks)dsperse/src/run/utils/runner_utils.py(1 hunks)dsperse/src/verifier.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dsperse/src/cli/prove.py (4)
dsperse/src/cli/base.py (3)
save_result(218-240)prompt_for_value(242-302)normalize_path(21-32)dsperse/src/utils/utils.py (3)
Utils(12-359)load_run_metadata(174-180)run_results_path(237-238)dsperse/src/cli/verify.py (4)
is_run_id_dir(148-151)is_run_root_dir(153-158)get_latest_run(67-83)get_all_runs(43-65)install.sh (1)
info(57-57)
dsperse/src/run/utils/runner_utils.py (4)
dsperse/src/slice/onnx_slicer.py (1)
slice(326-457)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)convert(20-90)detect_type(93-122)dsperse/src/utils/torch_utils.py (1)
ModelUtils(20-538)dsperse/src/run/runner.py (2)
_run_ezkl_slice(123-188)run_onnx_slice(88-121)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
EZKL(107-693)compilation_pipeline(578-671)get_version(137-158)dsperse/src/compile/utils/compiler_utils.py (6)
is_ezkl_compilation_successful(105-113)get_relative_paths(116-129)update_slice_metadata(178-227)run_onnx_inference_chain(231-277)parse_layers(57-74)is_sliced_model(15-54)dsperse/src/slice/utils/converter.py (1)
convert(20-90)dsperse/src/utils/utils.py (2)
find_metadata_path(45-65)save_metadata_file(18-42)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
save_metadata_file(18-42)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)detect_type(93-122)convert(20-90)
🪛 markdownlint-cli2 (0.18.1)
README.md
92-92: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🪛 Ruff (0.14.5)
dsperse/src/cli/convert.py
81-82: try-except-pass detected, consider logging the exception
(S110)
81-81: Do not catch blind exception: Exception
(BLE001)
96-96: Do not catch blind exception: Exception
(BLE001)
dsperse/src/verifier.py
47-48: try-except-pass detected, consider logging the exception
(S110)
47-47: Do not catch blind exception: Exception
(BLE001)
135-135: Avoid specifying long messages outside the exception class
(TRY003)
171-171: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/cli/prove.py
105-105: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
137-137: Do not catch blind exception: Exception
(BLE001)
162-162: Do not catch blind exception: Exception
(BLE001)
289-289: Do not catch blind exception: Exception
(BLE001)
dsperse/src/cli/verify.py
102-102: Do not catch blind exception: Exception
(BLE001)
122-122: Do not catch blind exception: Exception
(BLE001)
133-133: Do not catch blind exception: Exception
(BLE001)
157-157: Do not catch blind exception: Exception
(BLE001)
169-169: Local variable specified_run_dir is assigned to but never used
Remove assignment to unused variable specified_run_dir
(F841)
242-242: Do not catch blind exception: Exception
(BLE001)
291-291: Do not catch blind exception: Exception
(BLE001)
dsperse/src/run/runner.py
23-23: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
23-23: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
27-27: Create your own exception
(TRY002)
27-27: Avoid specifying long messages outside the exception class
(TRY003)
48-48: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
69-69: Create your own exception
(TRY002)
69-69: Avoid specifying long messages outside the exception class
(TRY003)
88-88: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
123-123: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
136-136: Do not catch blind exception: Exception
(BLE001)
168-168: Do not catch blind exception: Exception
(BLE001)
213-213: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
dsperse/src/cli/slice.py
97-98: try-except-pass detected, consider logging the exception
(S110)
97-97: Do not catch blind exception: Exception
(BLE001)
112-112: Do not catch blind exception: Exception
(BLE001)
dsperse/src/run/utils/runner_utils.py
36-36: Do not catch blind exception: Exception
(BLE001)
50-50: Consider moving this statement to an else block
(TRY300)
51-51: Do not catch blind exception: Exception
(BLE001)
100-100: Do not catch blind exception: Exception
(BLE001)
109-109: Unused static method argument: model_directory
(ARG004)
109-109: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
109-109: Unused static method argument: save_reshape
(ARG004)
132-132: Avoid specifying long messages outside the exception class
(TRY003)
143-143: Prefer TypeError exception for invalid type
(TRY004)
143-143: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/compile/compiler.py
52-52: Avoid specifying long messages outside the exception class
(TRY003)
54-54: f-string without any placeholders
Remove extraneous f prefix
(F541)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
55-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
62-62: f-string without any placeholders
Remove extraneous f prefix
(F541)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Avoid specifying long messages outside the exception class
(TRY003)
181-181: Avoid specifying long messages outside the exception class
(TRY003)
203-203: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
80-80: Do not catch blind exception: Exception
(BLE001)
89-89: Do not catch blind exception: Exception
(BLE001)
126-126: Do not catch blind exception: Exception
(BLE001)
134-134: Function definition does not bind loop variable slice_dir_abs
(B023)
138-138: Function definition does not bind loop variable slice_key
(B023)
141-141: Function definition does not bind loop variable slice_dir_abs
(B023)
163-163: Do not catch blind exception: Exception
(BLE001)
182-182: Function definition does not bind loop variable slice_dir_abs
(B023)
185-185: Function definition does not bind loop variable slice_key
(B023)
188-188: Function definition does not bind loop variable slice_dir_abs
(B023)
341-341: Do not catch blind exception: Exception
(BLE001)
350-351: try-except-pass detected, consider logging the exception
(S110)
350-350: Do not catch blind exception: Exception
(BLE001)
361-361: Do not catch blind exception: Exception
(BLE001)
376-376: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
376-376: Useless if-else condition
(RUF034)
398-399: try-except-pass detected, consider logging the exception
(S110)
398-398: Do not catch blind exception: Exception
(BLE001)
410-410: Do not catch blind exception: Exception
(BLE001)
420-421: try-except-continue detected, consider logging the exception
(S112)
420-420: Do not catch blind exception: Exception
(BLE001)
430-431: try-except-pass detected, consider logging the exception
(S110)
430-430: Do not catch blind exception: Exception
(BLE001)
475-475: Do not catch blind exception: Exception
(BLE001)
493-494: try-except-pass detected, consider logging the exception
(S110)
493-493: Do not catch blind exception: Exception
(BLE001)
517-517: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (23)
dsperse/src/cli/prove.py (1)
26-42: Updated CLI interface with positional arguments looks good.The new positional arguments (
run_path,data_path) with optional--proof-outputprovide a cleaner interface while maintaining backward compatibility with deprecated flags. Clear help text distinguishes the preferred approach from legacy options.dsperse/src/run/utils/runner_utils.py (2)
78-93:execute_sliceprovides clean fallback orchestration.Good pattern: tries EZKL first when
use_circuitis set, falls back to ONNX on failure, and properly propagates error context throughattempted_ezkland merged error info.
271-342: Shape validation logic is thorough but returnsTrueon indeterminate cases.
check_expected_shapereturnsTruewhen it cannot determine the expected shape (line 341). This is a reasonable "optimistic" default, but callers should be aware that aTruereturn doesn't guarantee shape correctness—only that no mismatch was detected.Consider documenting this behavior in the docstring:
Returns: - bool: True if shapes match, False otherwise + bool: True if shapes match or could not be determined, False if a mismatch is detectedREADME.md (2)
218-222: EZKL artifact paths updated to match new pipeline.The documentation now correctly reflects the new artifact layout:
slice_<i>/ezkl/containingsettings.json,model.compiled,vk.key,pk.key. This addresses the previous review comment about stale paths.
381-446: Comprehensive documentation for slicing outputs and format conversion.The new sections clearly explain:
- Output format options (
dsperse,dslice,dirs) and their effects- Conversion workflows with practical examples
- Cleanup behavior and bundle structure
This significantly improves user discoverability of the new packaging features.
dsperse/src/verifier.py (2)
69-77: Critical bug from previous review is now fixed.The code now properly checks for missing
settings_pathandvk_pathbefore callingezkl_runner.verify(), preventing theFileNotFoundErrorthat would occur whenNonewas passed and converted to the string"None". Good fix!
103-121: Conversion round-trip inverify_dsliceandverify_dspersemay have unintended side effects.Both methods convert to dirs, verify, then convert back to original format. The re-conversion at the end (lines 109, 119) might overwrite or modify files unexpectedly if verification added artifacts.
Verify that the round-trip conversion after verification is intentional and doesn't lose or corrupt any verification artifacts that might be written to the dirs during
verify_dirs.dsperse/src/cli/slice.py (1)
194-199: LGTM: Clear separation of slicing and packaging concerns.The update to always emit directory-based output from the slicer is a good architectural decision. The inline comments clearly direct users to the
convertsub-command for packaging, making the workflow explicit and maintainable.dsperse/src/run/runner.py (2)
48-85: LGTM: Well-structured orchestration with clear delegation.The
runmethod follows a clean orchestration pattern:
- Normalizes runtime inputs via
RunnerUtils- Iterates through execution chain
- Accumulates per-slice results
- Computes final prediction with softmax
- Persists results and handles repackaging
The error handling and delegation to utilities keep the method focused and maintainable.
190-247: LGTM: Well-structured output formatting.The
_save_inference_outputmethod creates a comprehensive and well-organized inference output structure including:
- Model path and prediction results
- Execution chain with per-slice details
- Security percentage calculation
- Performance comparison placeholder
The nested structure with
witness_executionis clear and maintainable.dsperse/src/compile/compiler.py (3)
83-92: LGTM: Clean delegation to EZKL backend.The
_compile_modelmethod provides a straightforward interface for compiling non-sliced models with proper validation and directory setup.
95-140: LGTM: Well-organized two-phase compilation.The
_compile_slicesmethod implements a clean two-phase approach:
- Optional ONNX inference chain for calibration
- Per-slice EZKL compilation with metadata updates
The per-slice metadata persistence (line 138) ensures incremental progress is saved, which is good for long-running compilations.
164-176: LGTM: Smart round-trip conversion handling.The compile method intelligently handles different packaging formats by:
- Detecting if the model is sliced
- Converting to
dirsformat if needed- Performing compilation
- Converting back to original format with cleanup
This approach maintains user expectations while working with a standardized internal format.
dsperse/src/compile/utils/compiler_utils.py (5)
41-47: LGTM: Robust error handling for type detection.The try-except block around
Converter.detect_typeprevents unexpected exceptions when checking arbitrary directories. This makesis_sliced_modelsafe to call on any user-provided path, as confirmed in previous reviews.
56-74: LGTM: Robust layer parsing with clear error handling.The
parse_layersmethod handles both single indices and ranges, with appropriate warning logs for invalid inputs. The use ofsorted(set(...))ensures unique, ordered results.
78-174: LGTM: Well-focused utility functions.The path manipulation helpers provide clear, single-purpose functionality:
- Extracting payload-relative paths
- Adding slice prefixes
- Validating compilation artifacts
- Building structured EZKL metadata
Each function has a clear responsibility and includes appropriate null-safety checks.
177-227: LGTM: Comprehensive metadata persistence.The
update_slice_metadatamethod properly:
- Loads existing metadata or creates new
- Captures EZKL version and timestamp
- Nests compilation info under
compilation.ezkl- Includes error information when present
- Persists atomically to disk
This provides good traceability for compilation results.
230-277: LGTM: Clean ONNX inference chain for calibration.The
run_onnx_inference_chainmethod implements a clear pipeline:
- Validates input file existence
- Iterates through slices
- Threads output of one slice as input to the next
- Generates calibration.json files for EZKL
Previous review comments about f-string and unused tensor were addressed.
dsperse/src/analyzers/runner_analyzer.py (5)
19-39: LGTM: Clean refactor to stateless utilities.The transition to a stateless analyzer with static methods is a good architectural choice. The path helper methods (
rel_from_payload,with_slice_prefix) are simple, focused, and reusable across the module.
194-227: LGTM: Comprehensive slice metadata construction.The slice metadata dictionary is well-structured with:
- Input/output shapes
- EZKL compatibility and usage flags
- Circuit size and dependencies
- Conditional inclusion of circuit paths based on availability
The logic properly normalizes paths to
slice_#/payload/...format for consistency.
229-313: LGTM: Well-structured execution chain construction.The execution chain building methods demonstrate good separation of concerns:
build_run_metadataorchestrates the overall structure_build_execution_chainhandles node graph construction with proper primary/fallback paths_build_circuit_slicestracks circuit usage_calculate_securitycomputes the security percentageget_execution_chainprovides clean public accessThe logic properly handles edge cases like empty slices and trusts the computed 'ezkl' flag without redundant file checks.
386-447: LGTM: Well-focused helper methods.The helper methods provide clear, single-purpose functionality:
_has_model_metadatachecks for metadata presence_build_from_model_metadatadelegates to existing utilities_build_from_per_slice_dirshandles fallback scanning with defensive error handlingThe per-slice directory scanning includes appropriate try-except blocks and handles edge cases like single-slice directories.
449-520: LGTM: Robust metadata generation with proper defaulting.The
generate_run_metadatamethod now properly handles all inputs:
- Defaults
save_pathwhen None (lines 501-507) - addressing previous review- Detects model-level metadata to avoid unnecessary conversions
- Normalizes inputs when needed
- Persists metadata with proper directory creation
- Converts back to original format when applicable
The logic is defensive and handles edge cases well.
| # Prompt if not provided | ||
| if not getattr(args, 'input_path', None): | ||
| args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)') | ||
| else: | ||
| args.input_path = normalize_path(args.input_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent normalization: prompt path not normalized.
When input_path is missing, prompt_for_value is called but the result isn't normalized. The else branch normalizes, but the prompt branch doesn't:
if not getattr(args, 'input_path', None):
- args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)')
+ args.input_path = normalize_path(prompt_for_value('input', 'Enter input path (dsperse file or slices directory)'))
else:
args.input_path = normalize_path(args.input_path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Prompt if not provided | |
| if not getattr(args, 'input_path', None): | |
| args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)') | |
| else: | |
| args.input_path = normalize_path(args.input_path) | |
| # Prompt if not provided | |
| if not getattr(args, 'input_path', None): | |
| args.input_path = normalize_path(prompt_for_value('input', 'Enter input path (dsperse file or slices directory)')) | |
| else: | |
| args.input_path = normalize_path(args.input_path) |
🤖 Prompt for AI Agents
In dsperse/src/cli/convert.py around lines 48 to 52, the code normalizes
args.input_path only in the else branch, but when prompting for the input path
the prompted value is not passed through normalize_path; update the prompt
branch to assign args.input_path = normalize_path(prompt_for_value('input',
'Enter input path (dsperse file or slices directory'))) (or capture the prompt
result, validate it's not None/empty, then call normalize_path before
assignment) so both code paths produce a normalized path.
| results = verifier.verify(run_path, slices_dir) | ||
|
|
||
| # Display results | ||
| print(f"\nVerification completed!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extraneous f-string prefix.
Line 171 uses an f-string with no placeholders.
- print(f"\nVerification completed!")
+ print("\nVerification completed!")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\nVerification completed!") | |
| print("\nVerification completed!") |
🧰 Tools
🪛 Ruff (0.14.5)
171-171: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In dsperse/src/verifier.py around line 171, the print call uses an unnecessary
f-string prefix despite having no placeholders; remove the leading f before the
string so it becomes a normal string literal (e.g., print("\nVerification
completed!")), keeping the newline and message intact.
…, streamline function signatures, improve inference logic, and simplify slice processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (15)
dsperse/src/run/utils/runner_utils.py (3)
33-53: Silent exception swallowing innormalize_for_runtimehides root causes.Both
dsperseanddsliceconversion blocks catch bareExceptionand silently fall back to a default path. This makes debugging difficult when conversion fails unexpectedly. Consider logging the exception before falling back.if packaging_type == "dsperse": try: converted = Converter.convert(source_path, output_type="dirs", cleanup=False) return Path(converted), "dsperse", model_path - except Exception: + except Exception as e: + logger.warning(f"Failed to convert dsperse source: {e}") return (model_path / "slices").resolve(), None, model_path if packaging_type == "dslice": try: sp = Path(source_path) if sp.is_file(): # Extract single .dslice file to a slice_* directory; use its parent as slices root slice_dir = Path(Converter.convert(str(sp), output_type="dirs", cleanup=False)) slices_root = slice_dir.parent else: # Directory containing .dslice files; expand into slice_* directories in place expanded_dir = Path(Converter.convert(str(sp), output_type="dirs", cleanup=False)) slices_root = expanded_dir return slices_root, "dslice", model_path - except Exception: + except Exception as e: + logger.warning(f"Failed to convert dslice source: {e}") return (model_path / "slices").resolve(), None, model_path
101-102: Fix type hints forOptionalparameters.Per PEP 484,
model_directory: str = Noneshould be typed asOptional[str]. Although the past review noted these parameters are reserved for deferred reshape functionality, the type hint should still be correct.- def preprocess_input(input_path:str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor: + def preprocess_input(input_path: str, model_directory: Optional[str] = None, save_reshape: bool = False) -> torch.Tensor:
79-94:execute_slicetightly couples to Runner's internal methods.This helper accepts a
runnerobject and calls private methods likerunner._run_ezkl_sliceandrunner.run_onnx_slice. This creates a bidirectional dependency betweenRunnerUtilsandRunner. Consider passing the required callables directly or moving this logic into theRunnerclass itself.dsperse/src/slice/utils/converter.py (2)
19-20: Use explicitOptionaltype hints.Multiple methods use implicit
Optional(e.g.,output_path: str = None). Per PEP 484, this should beOptional[str].@staticmethod - def convert(path: str, output_type: str = "dirs", output_path: str = None, cleanup: bool = True): + def convert(path: str, output_type: str = "dirs", output_path: Optional[str] = None, cleanup: bool = True) -> str:Apply similar changes to
_dirs_to_dslice,_dirs_to_dsperse,_dslice_to_dirs,_dsperse_to_dirs, and_zip_directory. You'll need to addfrom typing import Optionalif not already present (it's imported via__future__annotations but explicit import is cleaner).
346-361: Rename unused loop variable and clarify pattern matching logic.The
dirsvariable fromos.walkis unused. Additionally, the pattern matching logic at line 357 usingpattern.replace('*', '')is fragile for patterns like*_suffix.- for root, dirs, files in os.walk(source_dir): + for root, _dirs, files in os.walk(source_dir): root_path = Path(root) # Check if we should skip this directory if exclude_patterns: rel_root = root_path.relative_to(source_dir) skip = False for pattern in exclude_patterns: - if pattern.startswith('*'): - # Match file extension patterns - pass # We'll check files individually - elif str(rel_root).startswith(pattern.replace('*', '')): + # Skip wildcard-only patterns (handled at file level) + if pattern.startswith('*'): + continue + # Handle prefix patterns like 'slice_*' + prefix = pattern.rstrip('*') + if prefix and str(rel_root).startswith(prefix): skip = True breakdsperse/src/run/runner.py (4)
74-85: Duplicate_resolve_rel_pathhelper functions.Two nearly identical
_resolve_rel_pathfunctions are defined as nested functions inrun_onnx_slice(lines 74-85) and_run_ezkl_slice(lines 107-111). These should be consolidated into a single helper method on the class or inRunnerUtils.+ @staticmethod + def _resolve_rel_path(p: str, base_dir: Path, check_parent: bool = False) -> str | None: + """Resolve a possibly relative path under base_dir.""" + if not p: + return None + p_str = str(p) + if os.path.isabs(p_str): + return p_str + resolved = (Path(base_dir) / p_str).resolve() + if check_parent and not resolved.exists(): + resolved = (Path(base_dir).parent / Path(p_str)).resolve() + return str(resolved)Then use
Runner._resolve_rel_path(onnx_path, slice_dir)in both methods.Also applies to: 107-111
45-47: Use a custom exception instead of bareException.Raising a bare
Exceptionmakes it harder for callers to catch specific error conditions. Consider defining a custom exception class likeSlicePathErroror usingValueError/FileNotFoundError.if slice_path is None or not Path(slice_path).exists(): - raise Exception("A valid path must be provided for slices") + raise FileNotFoundError(f"Slice path does not exist or was not provided: {slice_path}")
246-248: Use a custom exception for inference failures.Similar to the earlier issue, raising bare
Exceptionreduces error handling granularity.if not ok: - raise Exception(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}") + raise RuntimeError(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")
24-24: Multiple implicitOptionaltype hints.Several parameters use
= NonewithoutOptionaltype annotation. This violates PEP 484.- def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None, slice_path: str | None = None): + def __init__(self, run_metadata_path: Optional[str] = None, save_metadata_path: Optional[str] = None, slice_path: Optional[str] = None): - def run(self, input_json_path, slice_path: str, output_path: str = None) -> dict: + def run(self, input_json_path: str, slice_path: str, output_path: Optional[str] = None) -> dict: - def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Path = None): + def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Optional[Path] = None): - def _run_ezkl_slice(self, slice_info: dict, input_tensor_path, output_witness_path, slice_dir: Path = None): + def _run_ezkl_slice(self, slice_info: dict, input_tensor_path, output_witness_path, slice_dir: Optional[Path] = None):Add
from typing import Optionalat the top if needed.Also applies to: 39-39, 67-67, 102-102
dsperse/src/compile/compiler.py (4)
31-38: Stale docstring references non-existent parameter.The docstring mentions
compiler_implas an argument, but the constructor takes no parameters. Update the docstring to reflect the actual implementation.def __init__(self): """ - Initialize the Compiler with a specific implementation. - - Args: - compiler_impl: The compiler implementation to use + Initialize the Compiler with the EZKL backend. """ self.ezkl = EZKL()
164-164: Variabletypeshadows built-in.Using
typeas a variable name shadows Python's built-intype()function. Rename toslice_typeorformat_type.- is_sliced, slice_path, type = CompilerUtils.is_sliced_model(model_path) + is_sliced, slice_path, slice_type = CompilerUtils.is_sliced_model(model_path) if is_sliced: # Convert to dirs if needed - if type != "dirs": + if slice_type != "dirs": slice_path = Converter.convert(model_path, output_type="dirs", cleanup=True) self._compile_slices(slice_path, input_file_path=input_file, layer_indices=layer_indices) # Convert back to original type if needed - if type != "dirs": - slice_path = Converter.convert(slice_path, output_type=type, cleanup=True) + if slice_type != "dirs": + slice_path = Converter.convert(slice_path, output_type=slice_type, cleanup=True)
54-55: Remove extraneousfprefix from strings without placeholders.These f-strings contain no interpolation placeholders.
- logger.error(f"No valid path found for slice") - raise FileNotFoundError(f"No valid path found for slice") + logger.error("No valid path found for slice") + raise FileNotFoundError("No valid path found for slice") calibration_input = os.path.join( os.path.dirname(slice_path), "ezkl", - f"calibration.json" + "calibration.json" ) if os.path.exists(os.path.join(os.path.dirname(slice_path), "ezkl", "calibration.json")) else NoneAlso applies to: 62-62
125-138: Metadata saved after each slice — consider batching.
Utils.save_metadata_fileis called inside the loop (line 138), writing metadata to disk after every slice compilation. For models with many slices, this could be slow. Consider saving once after the loop completes, or documenting this as intentional for crash recovery.dsperse/src/analyzers/runner_analyzer.py (2)
289-364: Clarify or remove_normalize_to_dirs(unused, inconsistent contract)
_normalize_to_dirshas a fairly rich docstring and behavior, but:
- It’s not called anywhere in this file anymore.
- The docstring promises
(model_root, slices_dir, original_format)while the implementation returns only(slices_dir, original_format).model_rootis computed but never used.- Several broad
except Exceptionhandlers swallow errors without logging.If the new API no longer needs to accept
.dslice/.dspersedirectly, this helper is dead code and can be removed to avoid confusion.If you do still intend
generate_run_metadatato accept packaged formats, I’d recommend:
- Wiring
_normalize_to_dirsintogenerate_run_metadataand fixing its return type/contract.- Tightening exception handling and logging (e.g., log the exception instead of bare
passin theiterdirbranches).Example of integrating it at the top of
generate_run_metadata:slices_dir, normalized_format = RunnerAnalyzer._normalize_to_dirs(slice_path) if original_format is None: original_format = normalized_format # then use slices_dir instead of slice_path belowRight now the presence of this function plus the
original_formatparameter is misleading because they’re not actually used together.
381-424: Avoid raising bareExceptionwhen no slices are foundIn
_build_from_per_slice_dirs:if not slices_data: raise Exception(f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json.")Raising a generic
Exceptionmakes it harder for callers to distinguish this from other failure modes. A more specific type (e.g.,ValueErroror a customNoSlicesFoundError) would improve error handling and testability.For example:
- if not slices_data: - raise Exception(f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json.") + if not slices_data: + raise ValueError( + f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json." + )Callers that want to treat “no slices” as a fatal configuration error can then catch this specific exception.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
dsperse/src/analyzers/runner_analyzer.py(3 hunks)dsperse/src/compile/compiler.py(1 hunks)dsperse/src/compile/utils/compiler_utils.py(1 hunks)dsperse/src/run/runner.py(1 hunks)dsperse/src/run/utils/runner_utils.py(1 hunks)dsperse/src/slice/slicer.py(4 hunks)dsperse/src/slice/utils/converter.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
EZKL(107-693)compilation_pipeline(578-671)get_version(137-158)dsperse/src/compile/utils/compiler_utils.py (7)
CompilerUtils(12-285)is_ezkl_compilation_successful(105-113)get_relative_paths(116-129)update_slice_metadata(178-235)run_onnx_inference_chain(239-285)parse_layers(57-74)is_sliced_model(15-54)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)convert(20-90)dsperse/src/utils/utils.py (3)
Utils(12-359)find_metadata_path(45-65)save_metadata_file(18-42)
dsperse/src/run/runner.py (6)
dsperse/src/analyzers/runner_analyzer.py (3)
RunnerAnalyzer(16-460)generate_run_metadata(427-460)get_execution_chain(275-278)dsperse/src/backends/ezkl.py (2)
EZKL(107-693)generate_witness(160-242)dsperse/src/backends/onnx_models.py (2)
OnnxModels(13-222)run_inference(18-60)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-330)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)detect_type(93-122)convert(20-90)dsperse/src/utils/utils.py (3)
Utils(12-359)read_input(134-171)write_input(127-131)
dsperse/src/slice/slicer.py (2)
dsperse/src/slice/onnx_slicer.py (3)
slice(326-457)OnnxSlicer(16-519)slice_model(499-519)dsperse/src/slice/utils/converter.py (2)
Converter(17-382)convert(20-90)
dsperse/src/analyzers/runner_analyzer.py (3)
dsperse/src/utils/utils.py (2)
Utils(12-359)save_metadata_file(18-42)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)detect_type(93-122)convert(20-90)dsperse/src/cli/convert.py (1)
convert(44-98)
dsperse/src/compile/utils/compiler_utils.py (2)
dsperse/src/backends/ezkl.py (2)
EZKL(107-693)get_version(137-158)dsperse/src/slice/utils/converter.py (1)
detect_type(93-122)
🪛 Ruff (0.14.6)
dsperse/src/compile/compiler.py
52-52: Avoid specifying long messages outside the exception class
(TRY003)
54-54: f-string without any placeholders
Remove extraneous f prefix
(F541)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
55-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
62-62: f-string without any placeholders
Remove extraneous f prefix
(F541)
85-85: Avoid specifying long messages outside the exception class
(TRY003)
155-155: Avoid specifying long messages outside the exception class
(TRY003)
181-181: Avoid specifying long messages outside the exception class
(TRY003)
203-203: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/run/runner.py
24-24: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
24-24: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
39-39: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
46-46: Create your own exception
(TRY002)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
67-67: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
102-102: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
135-135: Do not catch blind exception: Exception
(BLE001)
180-180: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
247-247: Create your own exception
(TRY002)
247-247: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/slice/utils/converter.py
20-20: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
35-35: Avoid specifying long messages outside the exception class
(TRY003)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
76-76: Do not catch blind exception: Exception
(BLE001)
88-88: Do not catch blind exception: Exception
(BLE001)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
120-120: Avoid specifying long messages outside the exception class
(TRY003)
122-122: Avoid specifying long messages outside the exception class
(TRY003)
135-135: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
165-165: Do not catch blind exception: Exception
(BLE001)
170-170: Avoid specifying long messages outside the exception class
(TRY003)
195-195: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
264-264: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
287-287: Avoid specifying long messages outside the exception class
(TRY003)
306-306: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
313-313: Avoid specifying long messages outside the exception class
(TRY003)
341-341: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
346-346: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
437-437: Do not catch blind exception: Exception
(BLE001)
dsperse/src/run/utils/runner_utils.py
37-37: Do not catch blind exception: Exception
(BLE001)
51-51: Consider moving this statement to an else block
(TRY300)
52-52: Do not catch blind exception: Exception
(BLE001)
102-102: Unused static method argument: model_directory
(ARG004)
102-102: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
102-102: Unused static method argument: save_reshape
(ARG004)
125-125: Avoid specifying long messages outside the exception class
(TRY003)
136-136: Prefer TypeError exception for invalid type
(TRY004)
136-136: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/slice/slicer.py
70-70: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
73-73: Unused static method argument: slices_dir
(ARG004)
112-112: Function definition does not bind loop variable slice_key
(B023)
140-140: Unused static method argument: slices_dir
(ARG004)
154-154: Do not catch blind exception: Exception
(BLE001)
314-314: Do not catch blind exception: Exception
(BLE001)
323-324: try-except-pass detected, consider logging the exception
(S110)
323-323: Do not catch blind exception: Exception
(BLE001)
334-334: Do not catch blind exception: Exception
(BLE001)
349-349: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
349-349: Useless if-else condition
(RUF034)
376-377: try-except-pass detected, consider logging the exception
(S110)
376-376: Do not catch blind exception: Exception
(BLE001)
413-413: Create your own exception
(TRY002)
413-413: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
dsperse/src/run/utils/runner_utils.py (1)
1-335: LGTM overall — past review issues have been addressed.The empty-list handling (Line 124-125), directory guard in
save_to_file_shaped(Lines 179-181), filter_tensor cleanup (Lines 251-255), and__main__fix (Line 335) from previous reviews have all been properly implemented.dsperse/src/slice/slicer.py (1)
27-55: Clean dependency injection pattern.The constructor accepting
slicer_impland the factory methodcreate()provide a clean separation. This will make it easier to add support for other model types in the future.dsperse/src/run/runner.py (1)
39-64: Good format detection and cleanup pattern.The
runmethod properly detects the input format, converts todirsfor processing, and converts back to the original format afterward. This ensures the user's original packaging is preserved.dsperse/src/compile/compiler.py (1)
143-182: Solid compile orchestration with format normalization.The
compilemethod properly detects sliced models, converts todirsformat for processing, compiles, and converts back to the original format. The layer filtering viaparse_layersis a nice feature for selective compilation.dsperse/src/slice/utils/converter.py (1)
378-382: I'll help you verify the security concern aboutzipfile.extractallin the converter.py file. Let me start by examining the code context and understanding how this method is being used.
<function_calls>
#!/bin/bashFirst, let's read the file to see the full context around the _unzip_file method
wc -l dsperse/src/slice/utils/converter.py
</function_calls>
<function_calls>
#!/bin/bashRead the specific section around lines 378-382
sed -n '360,400p' dsperse/src/slice/utils/converter.py
</function_calls>
<function_calls>
#!/bin/bashSearch for callers of _unzip_file to understand usage patterns
rg -n "_unzip_file" dsperse/src/slice/utils/converter.py -A 5 -B 5
</function_calls>
<function_calls>
#!/bin/bashSearch for any callers of _unzip_file across the entire codebase
rg -n "_unzip_file" --type python
</function_calls>
<function_calls>
#!/bin/bashSearch for extractall usage across the codebase to see if there are any other implementations
rg -n "extractall" --type python -B 3 -A 3
</function_calls>Now let me research current Python best practices and vulnerability information:
<function_calls>
Python zipfile.extractall path traversal vulnerability zip slip security
</function_calls>
<function_calls>
Python zipfile extractall safe extraction validation CVE
</function_calls>
| def generate_run_metadata(slice_path: Path, save_path=None, original_format=None): | ||
| """ | ||
| Build run-metadata from a slices source (dirs/.dslice/.dsperse) and save it. | ||
| - Normalizes inputs to dirs temporarily when needed (no cleanup). | ||
| - Prefers model-level slices/metadata.json when present; otherwise scans per-slice dirs. | ||
| - Emits paths normalized as 'slice_#/payload/...'. | ||
| - Saves to save_path or default '<parent_of_slice_path>/run/metadata.json'. | ||
| - Converts back to original packaging when original_format != 'dirs'. | ||
| Returns the run-metadata dict. | ||
| """ | ||
|
|
||
| if RunnerAnalyzer._has_model_metadata(slice_path): | ||
| run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path) | ||
| else: | ||
| run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path) | ||
|
|
||
| # Attach packaging metadata | ||
| run_meta["packaging_type"] = original_format | ||
| run_meta["source_path"] = str(slice_path) | ||
|
|
||
| # Save | ||
| if save_path is None: | ||
| base = Path(slice_path).resolve() | ||
| # If slice_path is a model root or slices dir, put run/metadata.json next to it | ||
| base_dir = base if base.is_dir() else base.parent | ||
| save_path = base_dir / "run" / "metadata.json" | ||
| else: | ||
| save_path = Path(save_path).resolve() | ||
|
|
||
| save_path.parent.mkdir(parents=True, exist_ok=True) | ||
| run_meta["run_directory"] = str(save_path.parent) | ||
| Utils.save_metadata_file(run_meta, save_path) | ||
|
|
||
| return run_meta | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate_run_metadata currently crashes for string paths (including the __main__ demo)
generate_run_metadata is annotated to take a Path, but it’s often convenient (and natural) to pass a string. The __main__ block does exactly that:
model_path = Path(model_dir).resolve()
...
out = RunnerAnalyzer.generate_run_metadata(str(model_path))Inside generate_run_metadata, you immediately call:
if RunnerAnalyzer._has_model_metadata(slice_path):
...and _has_model_metadata uses the / operator:
return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not (
path / "payload").exists()When slice_path is a str, path / "metadata.json" raises TypeError: unsupported operand type(s) for /: 'str' and 'str'. So the demo block — and any other caller passing a string — will currently crash.
You can make this API robust (and keep it ergonomic) by normalizing slice_path to a Path at the top and broadening the type annotation:
- @staticmethod
- def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
+ @staticmethod
+ def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None):
@@
- if RunnerAnalyzer._has_model_metadata(slice_path):
- run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
- else:
- run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
+ # Normalize to Path for downstream helpers
+ slice_path = Path(slice_path)
+
+ if RunnerAnalyzer._has_model_metadata(slice_path):
+ run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
+ else:
+ run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
@@
- run_meta["packaging_type"] = original_format
- run_meta["source_path"] = str(slice_path)
+ run_meta["packaging_type"] = original_format
+ run_meta["source_path"] = str(slice_path.resolve())
@@
- if save_path is None:
- base = Path(slice_path).resolve()
+ if save_path is None:
+ base = slice_path.resolve()
@@
- model_dir = base_paths[model_choice]
+ model_dir = base_paths[model_choice]
@@
- out = RunnerAnalyzer.generate_run_metadata(str(model_path))
+ out = RunnerAnalyzer.generate_run_metadata(model_path)This fixes the immediate crash and makes the public API tolerant of both str and Path inputs, which is consistent with how other utilities in this codebase behave.
Also applies to: 462-473
| def get_relative_paths(compilation_data: Dict[str, Any], calibration_input: Optional[str]) -> dict[str, str | None]: | ||
| """ | ||
| Compute payload-relative paths for compiled artifacts and the calibration file. | ||
| Returns a tuple of (payload_rel_dict, calibration_rel_path). | ||
| """ | ||
| calibration_rel = CompilerUtils._rel_from_payload(calibration_input) if calibration_input and os.path.exists(calibration_input) else None | ||
| relative_paths = { | ||
| 'settings': CompilerUtils._rel_from_payload(compilation_data.get('settings')), | ||
| 'compiled': CompilerUtils._rel_from_payload(compilation_data.get('compiled')), | ||
| 'vk_key': CompilerUtils._rel_from_payload(compilation_data.get('vk_key')), | ||
| 'pk_key': CompilerUtils._rel_from_payload(compilation_data.get('pk_key')), | ||
| 'calibration': calibration_rel | ||
| } | ||
| return relative_paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix get_relative_paths docstring to match return type
The docstring still claims this returns a tuple (payload_rel_dict, calibration_rel_path), but the implementation returns a single dict with a "calibration" key. This can confuse callers and future maintainers.
I suggest updating the docstring to describe the actual dict shape:
- """
- Compute payload-relative paths for compiled artifacts and the calibration file.
- Returns a tuple of (payload_rel_dict, calibration_rel_path).
- """
+ """
+ Compute payload-relative paths for compiled artifacts and the optional calibration file.
+
+ Returns:
+ A dict mapping keys ('settings', 'compiled', 'vk_key', 'pk_key', 'calibration')
+ to payload-relative paths or None.
+ """🤖 Prompt for AI Agents
In dsperse/src/compile/utils/compiler_utils.py around lines 116 to 129, the
docstring incorrectly states the function returns a tuple; update it to describe
the actual return value (a single dict) and its shape: state that the function
computes payload-relative paths and returns a dict with keys 'settings',
'compiled', 'vk_key', 'pk_key' mapping to str or None, and 'calibration' mapping
to str or None (depending on existence), matching the function signature
dict[str, str | None].
| def run_onnx_inference_chain(slices_data: list, base_path: str, input_file_path: Optional[str] = None): | ||
| """ | ||
| Phase 1: Run ONNX inference chain to generate calibration files. | ||
| Args: | ||
| slices_data: List of slice metadata | ||
| base_path: Base path for relative file paths | ||
| input_file_path: Path to the initial input file | ||
| """ | ||
| current_input = input_file_path | ||
| if current_input and os.path.exists(current_input): | ||
| logger.info("Running ONNX inference chain to generate calibration files") | ||
| for idx, slice_data in enumerate(slices_data): | ||
| slice_path = slice_data.get('path') | ||
| # First try full path | ||
| if slice_path and os.path.exists(slice_path): | ||
| pass # Use the full path | ||
| # Then try relative path | ||
| elif slice_data.get('relative_path'): | ||
| slice_path = os.path.join(base_path, slice_data.get('relative_path')) | ||
| if not os.path.exists(slice_path): | ||
| logger.warning(f"Slice file not found for index {idx}: {slice_path}") | ||
| continue | ||
| else: | ||
| logger.error(f"No valid path found for slice index {idx}") | ||
| continue | ||
|
|
||
| slice_output_path = os.path.join(os.path.dirname(slice_path), "ezkl") | ||
| os.makedirs(slice_output_path, exist_ok=True) | ||
|
|
||
| # Run ONNX inference to generate calibration data | ||
| output_tensor_path = os.path.join(slice_output_path, "calibration.json") | ||
| logger.info(f"Running ONNX inference for slice {idx} with input file {current_input}") | ||
| success, _tensor, exec_info = Runner.run_onnx_slice( | ||
| slice_info={"path": slice_path}, | ||
| input_tensor_path=Path(current_input), | ||
| output_tensor_path=Path(output_tensor_path) | ||
| ) | ||
|
|
||
| if not success: | ||
| logger.error(f"ONNX inference failed for slice {idx}: {exec_info.get('error', 'Unknown error')}") | ||
| return | ||
|
|
||
| current_input = output_tensor_path | ||
| logger.info(f"Generated calibration file: {output_tensor_path}") | ||
| else: | ||
| logger.warning("No input file provided, skipping ONNX inference chain") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differentiate “no input provided” vs “input path missing” in ONNX chain
run_onnx_inference_chain currently treats a provided-but-missing input_file_path the same as no input at all, logging “No input file provided” and silently skipping the chain. That makes configuration errors harder to diagnose.
You can make the behavior clearer (and fail fast) by splitting the two cases:
- current_input = input_file_path
- if current_input and os.path.exists(current_input):
- logger.info("Running ONNX inference chain to generate calibration files")
- for idx, slice_data in enumerate(slices_data):
+ current_input = input_file_path
+ if not current_input:
+ logger.warning("No input file provided, skipping ONNX inference chain")
+ return
+
+ if not os.path.exists(current_input):
+ logger.error(f"Input file for ONNX inference chain not found: {current_input}")
+ return
+
+ logger.info("Running ONNX inference chain to generate calibration files")
+ for idx, slice_data in enumerate(slices_data):
@@
- else:
- logger.warning("No input file provided, skipping ONNX inference chain")This preserves existing success-path behavior while giving a more accurate message (and early exit) for a bad path.
🤖 Prompt for AI Agents
In dsperse/src/compile/utils/compiler_utils.py around lines 239 to 285, the
function treats a provided-but-missing input_file_path the same as no input and
logs "No input file provided"; change the branching to (1) if input_file_path is
None/empty: log a warning and skip the ONNX chain (preserve current behavior),
(2) else if input_file_path is provided but os.path.exists(input_file_path) is
False: log an error stating the input path is missing and return/fail fast, and
(3) otherwise proceed with the existing inference loop using the validated path;
keep messages descriptive and do not swallow configuration errors.
| if not output_path: | ||
| raise ValueError("output_path must be provided for slicing") | ||
|
|
||
| logger.info(f"Slicing model to output path: {output_path}") | ||
| return self.slicer_impl.slice_model(output_path=output_path) | ||
| result = self.slicer_impl.slice_model(output_path=output_path) | ||
|
|
||
| if output_type != "dirs": | ||
| Converter.convert(output_path, output_type) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converter.convert return value is discarded.
When output_type != "dirs", the converter is called but its return value (the path to the converted output) is not captured or returned. If the conversion moves/renames files, the caller won't know the new location. Consider returning the converted path or documenting that the output is modified in-place at output_path.
if output_type != "dirs":
- Converter.convert(output_path, output_type)
+ converted_path = Converter.convert(output_path, output_type)
+ logger.info(f"Converted output to {output_type} at: {converted_path}")
return resultCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.6)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In dsperse/src/slice/slicer.py around lines 69 to 78, the return value from
Converter.convert is discarded when output_type != "dirs"; capture the
converter's returned path and return it (falling back to the original slicing
result if convert returns None). Concretely, call converted =
Converter.convert(output_path, output_type) and then return converted if
converted is truthy, otherwise return the existing result; ensure any
docstring/comments reflect that the method may return the converted path.
| elif current_type == "dslice": | ||
| if output_type == "dirs": | ||
| return Converter._dslice_to_dirs(path_obj, output_path) | ||
| elif output_type == "dsperse": | ||
| # Convert dslice -> dirs -> dsperse | ||
| temp_dirs = Converter._dslice_to_dirs(path_obj, None) | ||
| result = Converter._dirs_to_dsperse(Path(temp_dirs), output_path) | ||
| # Optionally clean up temp dirs | ||
| return result | ||
|
|
||
| elif current_type == "dsperse": | ||
| if output_type == "dirs": | ||
| # Extract and expand contained .dslice files into slice_* directories | ||
| result = Converter._dsperse_to_dirs(path_obj, output_path, expand_slices=True) | ||
| if cleanup: | ||
| try: | ||
| path_obj.unlink() | ||
| logger.info(f"Removed source dsperse file: {path_obj}") | ||
| except Exception as e: | ||
| logger.warning(f"Could not remove source dsperse file {path_obj}: {e}") | ||
| return result | ||
| elif output_type == "dslice": | ||
| # Convert dsperse -> dirs (without expanding slices) | ||
| temp_dirs = Converter._dsperse_to_dirs(path_obj, None, expand_slices=False) | ||
| # The dsperse already contains dslices, so just return the directory | ||
| logger.info(f"Extracted dsperse to directory with dslice files: {temp_dirs}") | ||
| if cleanup: | ||
| try: | ||
| path_obj.unlink() | ||
| logger.info(f"Removed source dsperse file: {path_obj}") | ||
| except Exception as e: | ||
| logger.warning(f"Could not remove source dsperse file {path_obj}: {e}") | ||
| return temp_dirs | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert() lacks explicit return for some code paths.
When current_type == "dslice" and output_type == "dsperse", the function returns the result of _dirs_to_dsperse. However, if current_type doesn't match any branch, the function returns None implicitly. Consider adding a fallback or raising an error for unhandled type combinations.
elif current_type == "dsperse":
if output_type == "dirs":
# ...
return result
elif output_type == "dslice":
# ...
return temp_dirs
+
+ # Should not reach here if types are valid
+ raise ValueError(f"Unsupported conversion: {current_type} -> {output_type}")🧰 Tools
🪛 Ruff (0.14.6)
76-76: Do not catch blind exception: Exception
(BLE001)
88-88: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In dsperse/src/slice/utils/converter.py around lines 58 to 91, the convert()
branching for current_type/output_type combinations can fall through and return
None for unhandled combinations; add an explicit fallback that raises a clear
exception (e.g., ValueError or NotImplementedError) including the current_type
and output_type values so callers get a deterministic error instead of None, and
ensure any temporary-path branches consistently return the same type (Path or
str) as the function contract expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
dsperse/src/analyzers/runner_analyzer.py (3)
53-70:process_slicesdispatch is brittle and will crash on empty inputThe current dispatch:
# Normalize input to a list if len(slices_data[0]) == 3: if isinstance(slices_data, dict): slices_data = [slices_data] return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data) else: return RunnerAnalyzer._process_slices_model(slices_dir, slices_data)has a few problems:
len(slices_data[0])will raiseIndexErrorwhenslices_datais empty.- The
isinstance(slices_data, dict)branch is effectively dead, since you indexslices_data[0]before that check.- Using
len(...) == 3as a structural discriminator is fragile; adding a field to those dicts will silently flip the dispatch.A more robust approach is to normalize dict→list, guard emptiness, and dispatch based on a distinguishing key such as
"slice_metadata":@staticmethod def process_slices(slices_dir, slices_data): @@ - # Normalize input to a list - if len(slices_data[0]) == 3: - if isinstance(slices_data, dict): - slices_data = [slices_data] - return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data) - else: - return RunnerAnalyzer._process_slices_model(slices_dir, slices_data) + # Normalize input and dispatch based on structure + if isinstance(slices_data, dict): + slices_data = [slices_data] + + if not slices_data: + return {} + + first = slices_data[0] + # Per-slice entries built in _build_from_per_slice_dirs carry 'slice_metadata' + if isinstance(first, dict) and "slice_metadata" in first: + return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data) + + # Otherwise treat as model-level metadata entries from slices/metadata.json. + return RunnerAnalyzer._process_slices_model(slices_dir, slices_data)This avoids crashes on empty input and makes the routing logic resilient to future metadata changes.
139-205: Guard_process_slices_per_sliceagainst missing metadata to avoidTypeErrorjoinsIn the per‑slice path, several
os.path.joincalls assume all metadata fields are present:parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0] ... onnx_path = os.path.join(parent_dir, slice.get("relative_path")) ... if files: circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled")) settings_path = os.path.join(parent_dir, files.get("settings")) pk_path = os.path.join(parent_dir, files.get("pk_key")) vk_path = os.path.join(parent_dir, files.get("vk_key"))If any of
slice.get("relative_path")or thefiles[...]values are missing/None(e.g. pre‑compilation, partial metadata, or failed compilation),os.path.join(parent_dir, None)will raiseTypeError, which contradicts the earlier goal that the analyzer “just analyzes what is there” and tolerates incomplete data.You can mirror the more defensive pattern from
_process_slices_modeland gate joins on presence:- meta_path = entry.get("slice_metadata") - parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0] + meta_path = entry.get("slice_metadata") + # Entry path is already normalized to "slice_X/payload/..."; keep only the slice dir. + parent_dir = os.path.dirname(entry.get("path") or "").split(os.sep)[0] @@ - # Normalize ONNX path - onnx_path = os.path.join(parent_dir, slice.get("relative_path")) + # Normalize ONNX path; tolerate missing fields + rel_from_meta = slice.get("relative_path") or slice.get("path") + onnx_path = os.path.join(parent_dir, rel_from_meta) if rel_from_meta else None @@ - comp = ((slice.get("compilation") or {}).get("ezkl") or {}) - files = (comp.get("files") or {}) - compiled_flag = bool(comp.get("compiled", False)) - - if files: - circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled")) - settings_path = os.path.join(parent_dir, files.get("settings")) - pk_path = os.path.join(parent_dir, files.get("pk_key")) - vk_path = os.path.join(parent_dir, files.get("vk_key")) - else: - circuit_path = None - settings_path = None - pk_path = None - vk_path = None + comp = ((slice.get("compilation") or {}).get("ezkl") or {}) + files = (comp.get("files") or {}) + compiled_flag = bool(comp.get("compiled", False)) + + circuit_path = None + settings_path = None + pk_path = None + vk_path = None + if files: + compiled_rel = files.get("compiled_circuit") or files.get("compiled") + + def _join_optional(key: str) -> Optional[str]: + rel = files.get(key) + return os.path.join(parent_dir, rel) if rel else None + + circuit_path = ( + os.path.join(parent_dir, compiled_rel) if compiled_rel else None + ) + settings_path = _join_optional("settings") + pk_path = _join_optional("pk_key") + vk_path = _join_optional("vk_key")The rest of the pipeline (execution_chain and security) already tolerates
Nonefor these paths, so this makes the analyzer robust to partially‑populated per‑slice metadata.
371-384:generate_run_metadatapath handling: crash on string paths and model‑root confusionThere are two related issues here:
String paths crash via
_has_model_metadata
generate_run_metadatais annotated to take aPath, but both the__main__demo and likely some callers pass a string:out = RunnerAnalyzer.generate_run_metadata(str(model_path))Inside, you immediately call:
if RunnerAnalyzer._has_model_metadata(slice_path):and
_has_model_metadatadoes:return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not ( path / "payload").exists()When
slice_pathis astr,path / "metadata.json"raisesTypeError: unsupported operand type(s) for /: 'str' and 'str'. So the__main__block — and any other string‑based caller — will currently crash.Model‑root vs slices‑dir mismatch
_has_model_metadatareturnsTruewhen eitherpath/metadata.jsonorpath/slices/metadata.jsonexists. For a model root that only hasslices/metadata.json,_has_model_metadata(model_root)returnsTrue, but_build_from_model_metadatathen callsload_slices_metadata(slices_dir)on that samemodel_root, trying to openmodel_root/metadata.json, which will fail. The demo currently passes the model root (../models/doom) rather than theslicesdir, so even after normalizing the type, this would still be incorrect.You can fix both by:
- Normalizing
slice_pathto aPathimmediately.- Resolving whether you were given a model root or a slices directory, and passing the actual slices dir into
_has_model_metadata/_build_from_model_metadata.- Making
_has_model_metadataitself tolerant ofstr | Path.For example:
- @staticmethod - def _has_model_metadata(path: Path) -> bool: - return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not ( - path / "payload").exists() + @staticmethod + def _has_model_metadata(path: Path | str) -> bool: + path = Path(path) + return ( + (path / "metadata.json").exists() + or (path / "slices" / "metadata.json").exists() + ) and not (path / "payload").exists() @@ - @staticmethod - def generate_run_metadata(slice_path: Path, save_path=None, original_format=None): + @staticmethod + def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None): @@ - if RunnerAnalyzer._has_model_metadata(slice_path): - run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path) - else: - run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path) + # Normalize the input and distinguish model root vs slices directory + slice_path = Path(slice_path) + if (slice_path / "slices" / "metadata.json").exists(): + slices_dir = slice_path / "slices" + else: + slices_dir = slice_path + + if RunnerAnalyzer._has_model_metadata(slices_dir): + run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir) + else: + run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir) @@ - run_meta["packaging_type"] = original_format - run_meta["source_path"] = str(slice_path) + run_meta["packaging_type"] = original_format + run_meta["source_path"] = str(slice_path.resolve()) @@ - if save_path is None: - base = Path(slice_path).resolve() + if save_path is None: + base = slice_path.resolve() @@ -if __name__ == "__main__": +if __name__ == "__main__": @@ - out = RunnerAnalyzer.generate_run_metadata(str(model_path)) + out = RunnerAnalyzer.generate_run_metadata(model_path)This keeps the API ergonomic (accepting both
strandPath), allows callers to pass either a model root or a slices directory, and makes the__main__smoke test actually executable.Also applies to: 433-466, 468-479
🧹 Nitpick comments (5)
dsperse/src/analyzers/runner_analyzer.py (5)
16-40: Stateless design and path helpers look good (minor nit on__init__)The stateless, static‑method design plus
rel_from_payload/with_slice_prefixhelpers reads clean and matches the new per‑slice payload layout. The explicit__init__that justpasses is technically unnecessary though; you could drop it entirely to emphasize that this class is not meant to be instantiated.
42-50: Broader exception handling inload_slices_metadatacould be tightenedCatching a bare
Exceptionand logging vialogger.errorbefore re‑raising loses stack context and may hide the specific failure mode (e.g., file not found vs JSON decode). If you care about diagnosability, consider narrowing toOSError | json.JSONDecodeErrorand usinglogger.exceptionso the traceback is preserved in logs.
72-137: Model‑metadata pathing logic is sound;slices_dirarg is currently unusedThe normalization to
slice_#/payload/...viarel_from_payload+with_slice_prefixand reuse of thefilesdict for EZKL artifacts looks consistent with the new directory layout, and it’s good that this path is now metadata‑driven instead of hitting the filesystem.Right now
slices_diris not used in_process_slices_model, so either:
- Drop the parameter if you don’t plan to use it, or
- Start threading it through where you eventually need model‑root context (e.g., for future size calculations or extra metadata).
This is purely a cleanliness concern; behavior as written looks correct.
225-293: Execution chain is consistent, but security metric may overstate actual circuit usageThe execution chain construction:
- Orders
slice_*keys numerically.- Sets
use_circuitonly whenezklis truthy and the circuit and both keys are present.- Fills
fallback_mapfrom circuit→onnx (or slice→onnx).This looks coherent and filesystem‑independent.
However,
_calculate_securitycurrently counts a slice as “secured” purely based onslice_data.get("ezkl", False):circuit_slices = sum(1 for slice_data in slices.values() if slice_data.get("ezkl", False)) return round((circuit_slices / total_slices) * 100, 1)whereas the actual runtime decision (
use_circuit) also requireshas_circuitandhas_keys. If metadata ever hascompiled=Truebut is missing artifacts/keys,overall_securitywill be higher than the fraction of slices actually using circuits.If you intend
overall_securityto reflect effective circuit usage, consider computing it from the same predicate used in_build_execution_chain(or fromcircuit_slicesbuilt from that predicate) instead of rawezkl.
295-369:_normalize_to_dirsbehavior and docs are slightly out of syncThis helper is valuable for dslice/dsperse integration, but a couple of nits:
- The docstring says it normalizes to
(model_root, slices_dir, original_format), but all return sites only return(slices_dir, original_format). If callers only unpack two values today this is just misleading documentation; if any still expect three, they’ll break.- The
model_rootvariable in thedetected == 'dirs'branch is assigned but unused; theif/elseis effectively a no‑op.A minimal clean‑up would be:
- def _normalize_to_dirs(slice_path: str): - """Normalize input to (model_root, slices_dir, original_format). + def _normalize_to_dirs(slice_path: str): + """Normalize an input path into a slices directory plus its original packaging format. @@ - if detected == 'dirs': - sdir = path_obj - # If this looks like a slices folder (has metadata.json), parent is model root - model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent - return sdir.resolve(), 'dirs' + if detected == 'dirs': + sdir = path_obj + return sdir.resolve(), 'dirs'If
_normalize_to_dirsis meant to be the primary entry for dslice/dsperse inputs, it would also be worth double‑checking its callers to ensure they’re using the updated 2‑tuple API consistently.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dsperse/src/analyzers/runner_analyzer.py(3 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
save_metadata_file(18-42)dsperse/src/slice/utils/converter.py (3)
Converter(17-382)detect_type(93-122)convert(20-90)
🪛 Ruff (0.14.6)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
73-73: Unused static method argument: slices_dir
(ARG004)
112-112: Function definition does not bind loop variable slice_key
(B023)
140-140: Unused static method argument: slices_dir
(ARG004)
154-154: Do not catch blind exception: Exception
(BLE001)
320-320: Do not catch blind exception: Exception
(BLE001)
329-330: try-except-pass detected, consider logging the exception
(S110)
329-329: Do not catch blind exception: Exception
(BLE001)
340-340: Do not catch blind exception: Exception
(BLE001)
355-355: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
355-355: Useless if-else condition
(RUF034)
382-383: try-except-pass detected, consider logging the exception
(S110)
382-382: Do not catch blind exception: Exception
(BLE001)
419-419: Create your own exception
(TRY002)
419-419: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
dsperse/src/analyzers/runner_analyzer.py (1)
208-223:build_run_metadatacomposition looks correctAssembling
slices→execution_chain→circuit_slices→overall_securityin one place keeps the flow clear, and reusingprocess_slicesfor both model‑ and per‑slice cases is a good unification.No correctness issues stand out here.
requirements.txt
Outdated
| @@ -1,4 +1,4 @@ | |||
| torch==2.6.0 | |||
| torch==2.7.1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Step 1: Check torch usage patterns that might be affected by version changes
# torch.load usage (2.7 changed default weights_only parameter)
echo "=== Checking torch.load patterns ==="
rg -n "torch\.load\(" --type=py 2>/dev/null | head -20 || echo "No torch.load found"
echo ""
echo "=== Checking torch version conditionals ==="
# Check for any hardcoded version checks or conditionals related to torch
rg -n "torch.*version" -i --type=py 2>/dev/null | head -10 || echo "No version checks found"
echo ""
echo "=== Counting torch API usage ==="
# Count torch API patterns
rg -n "torch\.(nn\.|functional\.|cuda\.|backends\.)" --type=py 2>/dev/null | wc -l
echo ""
echo "=== Checking for deprecated torch patterns ==="
# Look for patterns that changed significantly between 2.6 and 2.7
rg -n "torch\.(utils\.checkpoint|cuda\.device_count|version\.)" --type=py 2>/dev/null | head -10 || echo "No patterns found"Repository: inference-labs-inc/dsperse
Length of output: 708
🏁 Script executed:
#!/bin/bash
# Step 2: Explore repository structure to understand the project better
echo "=== Repository structure ==="
ls -la
echo ""
echo "=== Python files in repo ==="
fd -t f -e py | head -20
echo ""
echo "=== Check requirements.txt content ==="
cat requirements.txtRepository: inference-labs-inc/dsperse
Length of output: 1656
🏁 Script executed:
#!/bin/bash
# Step 3: Check if there are any integration tests or test files that test torch compatibility
echo "=== Looking for test files ==="
fd -t f -e py -path '*test*' 2>/dev/null | head -20
echo ""
echo "=== Check for any CI configuration ==="
fd -t f \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null | head -10Repository: inference-labs-inc/dsperse
Length of output: 140
🏁 Script executed:
#!/bin/bash
# Step 4: Check GitHub workflows in .github directory
echo "=== Files in .github ==="
find .github -type f 2>/dev/null | head -20
echo ""
echo "=== Check for pytest or test configuration ==="
cat pyproject.toml
echo ""
echo "=== Check for any test commands in install.sh ==="
grep -i "test\|pytest" install.sh || echo "No test commands found"Repository: inference-labs-inc/dsperse
Length of output: 1049
🏁 Script executed:
#!/bin/bash
# Step 5: Examine torch.load usage more closely - checking for weights_only parameter
echo "=== Examining torch.load calls in detail ==="
echo ""
echo "--- torch_utils.py ---"
sed -n '30,40p' dsperse/src/utils/torch_utils.py
echo ""
echo "--- pth_models.py (lines 40-50) ---"
sed -n '40,50p' dsperse/src/backends/pth_models.py
echo ""
echo "--- pth_models.py (lines 95-105) ---"
sed -n '95,105p' dsperse/src/backends/pth_models.py
echo ""
echo "--- model_analyzer.py (lines 50-60) ---"
sed -n '50,60p' dsperse/src/analyzers/model_analyzer.pyRepository: inference-labs-inc/dsperse
Length of output: 2316
🏁 Script executed:
#!/bin/bash
# Step 6: Check if torch.load calls might have issues with 2.7.1
# In PyTorch 2.7.1, weights_only became True by default (breaking change in 2.13+, but affects loading)
echo "=== Check torch version in actual code or documentation ==="
rg -n "2\.6\|2\.7\|version" README.md --type=md | head -20
echo ""
echo "=== Check if there are requirements-dev or requirements-test files ==="
fd -t f -e txt | grep -i req
echo ""
echo "=== Full torch_utils.py to understand context ==="
wc -l dsperse/src/utils/torch_utils.pyRepository: inference-labs-inc/dsperse
Length of output: 306
🏁 Script executed:
#!/bin/bash
# Step 7: Check git diff to see what changed in pyproject.toml
echo "=== Git diff for pyproject.toml ==="
git diff HEAD~1 pyproject.toml 2>/dev/null || git show HEAD:pyproject.toml 2>/dev/null
echo ""
echo "=== Git diff for requirements.txt ==="
git diff HEAD~1 requirements.txt 2>/dev/null || git show HEAD:requirements.txt 2>/dev/null
echo ""
echo "=== Check git log for recent changes ==="
git log --oneline -10 2>/dev/nullRepository: inference-labs-inc/dsperse
Length of output: 1225
🏁 Script executed:
#!/bin/bash
# Step 8: Check if weights_only parameter is mentioned anywhere in the codebase
echo "=== Searching for weights_only ==="
rg -n "weights_only" --type=py
echo ""
echo "=== Check torch release notes or documentation references ==="
rg -n "torch" dsperse/README.md 2>/dev/null || rg -n "torch" README.md 2>/dev/null | head -20Repository: inference-labs-inc/dsperse
Length of output: 167
Critical dependency version mismatch: pyproject.toml still specifies torch==2.6.0 while requirements.txt has been updated to torch==2.7.1.
The PR updated only requirements.txt but did not update pyproject.toml. When the package is installed via pip install ., the pyproject.toml version takes precedence, meaning users will receive torch==2.6.0 instead of the intended 2.7.1. Both files must be updated consistently:
# pyproject.toml line ~19
dependencies = [
"torch==2.7.1", # Update from 2.6.0 to match requirements.txtAdditionally, review the four torch.load() calls in dsperse/src/utils/torch_utils.py (line 33), dsperse/src/backends/pth_models.py (lines 46, 99), and dsperse/src/analyzers/model_analyzer.py (line 56). Consider explicitly setting weights_only=False for backward compatibility, as PyTorch's default behavior may change in future versions.
🤖 Prompt for AI Agents
In requirements.txt around lines 1-1 and pyproject.toml around line ~19, the
torch version was updated to 2.7.1 in requirements.txt but pyproject.toml still
pins torch==2.6.0; update pyproject.toml's dependencies entry to "torch==2.7.1"
to keep installer behavior consistent. Also, in the codebase update the four
torch.load() calls for explicit backward-compatibility:
dsperse/src/utils/torch_utils.py line 33, dsperse/src/backends/pth_models.py
lines 46 and 99, and dsperse/src/analyzers/model_analyzer.py line 56 — modify
each call to include the argument weights_only=False to ensure consistent
semantics across PyTorch versions.
* Update torch and python version, plus some docks for deps management * Update py.test version and python==3.12 * 'Runner' class - pass slices_path to 'run', not 'init'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dsperse/src/cli/run.py (1)
109-117: Retry path should be normalized before existence check.When the user provides an alternative file path via
retry_option, the path is checked for existence without normalization. This could cause issues with paths containing~or environment variables.elif retry_option: + retry_option = normalize_path(retry_option) if os.path.exists(retry_option): args.input_file = retry_option
♻️ Duplicate comments (1)
requirements.txt (1)
1-3: LGTM - Generated lockfile with hash verification.The switch to a
uv-generated lockfile with hash pins and environment markers improves reproducibility and supply-chain security. The editable install (-e .) properly references the project.Note: The torch version mismatch with
pyproject.tomlwas already flagged in a previous review and should be addressed there.
🧹 Nitpick comments (5)
dsperse/src/cli/run.py (1)
68-81: Metadata detection may silently default to no slices path.When neither
metadata.jsonexists directly in the target directory nor in aslicessubdirectory,slices_dir_effectiveis set toNone(line 80), relying on Runner to auto-detect. Consider logging a debug message to indicate this fallback path is being taken, which would aid troubleshooting.else: # No immediate metadata found; Runner will attempt to detect dsperse inside the directory + logger.debug(f"No metadata.json found in {target_path} or {target_path}/slices; Runner will attempt auto-detection") slices_dir_effective = None model_dir = target_pathdsperse/src/run/runner.py (4)
24-34: Use explicitOptionaltype hints per PEP 484.The constructor parameters have implicit
Optionaltypes (str = None). PEP 484 recommends explicitOptional[str]orstr | Nonefor clarity.+from typing import Optional + class Runner: - def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None): + def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None): """Initialize the Runner.Apply similar changes to other method signatures at lines 36, 64, and 99.
41-44: Use a specific exception type for validation errors.Raising a bare
Exceptionmakes error handling harder for callers. Consider defining a custom exception or usingValueError.# Ensure slices path is available and valid if slice_path is None or not Path(slice_path).exists(): - raise Exception("A valid path must be provided for slices") + raise ValueError(f"A valid path must be provided for slices: {slice_path}") self.slices_path = Path(slice_path)
71-82: Extract duplicated path resolution logic.The
_resolve_rel_pathhelper is defined twice with slightly different implementations (lines 71-82 inrun_onnx_sliceand lines 104-108 in_run_ezkl_slice). Consider extracting to a shared utility method for consistency and maintainability.# Add as a static method on Runner or in RunnerUtils @staticmethod def _resolve_rel_path(path: str, base_dir: Path) -> str | None: """Resolve a possibly relative path against base_dir.""" if not path: return None if os.path.isabs(path): return path resolved = (base_dir / path).resolve() if not resolved.exists(): # Try parent as fallback resolved = (base_dir.parent / path).resolve() return str(resolved)Also applies to: 104-108
265-299: Demo block should be guarded or removed for production.The
if __name__ == "__main__"block with hardcoded model paths is useful for development but may cause confusion if inadvertently executed. Consider moving to a separate test/demo script or adding a comment clarifying its purpose.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
docs/uv_packaging.md(1 hunks)dsperse/src/cli/run.py(4 hunks)dsperse/src/run/runner.py(1 hunks)pyproject.toml(1 hunks)requirements.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/uv_packaging.md
🧰 Additional context used
🧬 Code graph analysis (2)
dsperse/src/run/runner.py (4)
dsperse/src/analyzers/runner_analyzer.py (2)
RunnerAnalyzer(16-466)generate_run_metadata(433-466)dsperse/src/backends/ezkl.py (2)
EZKL(107-693)generate_witness(160-242)dsperse/src/backends/onnx_models.py (2)
OnnxModels(13-222)run_inference(18-60)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-330)
dsperse/src/cli/run.py (4)
dsperse/src/run/runner.py (2)
run(36-61)Runner(23-262)dsperse/src/backends/pth_models.py (1)
run_inference(37-75)dsperse/src/backends/onnx_models.py (1)
run_inference(18-60)dsperse/src/cli/base.py (3)
prompt_for_value(242-302)normalize_path(21-32)check_model_dir(157-174)
🪛 Ruff (0.14.6)
dsperse/src/run/runner.py
24-24: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
24-24: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
36-36: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
43-43: Create your own exception
(TRY002)
43-43: Avoid specifying long messages outside the exception class
(TRY003)
64-64: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
99-99: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
132-132: Do not catch blind exception: Exception
(BLE001)
179-179: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
246-246: Create your own exception
(TRY002)
246-246: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (7)
pyproject.toml (2)
14-16: Torch version now consistent with requirements.txt.The torch version (
2.7.1) now matchesrequirements.txt, resolving the previously flagged mismatch. The mix of exact pins (==) and flexible constraints (>=,~=) is acceptable for distinguishing runtime dependencies from dev tooling.
26-27: CLI entry point is properly configured.The module
dsperse/main.pyexists and themain()function is defined at line 40. The entry point definition is correct.dsperse/src/cli/run.py (3)
30-36: Argument consolidation with backward-compatible aliases.The refactoring to use
--pathas the primary argument while retaining--slices-dirand other aliases maintains backward compatibility. The updated help text correctly reflects the new flexible input semantics (directory or.dsperse/.dslicefile).
53-61: Path resolution handles multiple input formats.The logic correctly resolves the target path from either the new
pathargument or the legacyslices_dir, with appropriate prompting and normalization. Thecheck_model_dirvalidation ensures the path exists before proceeding.
133-135: Runner instantiation and invocation align with new API.The Runner is correctly instantiated with only
run_metadata_pathand therunmethod receives bothinput_fileandslice_path. The fallback tomodel_dirwhenslices_dir_effectiveisNoneensures a valid path is always passed.dsperse/src/run/runner.py (2)
122-134: Exception handling for EZKL fallback is appropriate.The broad
except Exceptioncatch (line 132) is acceptable here since it enables graceful fallback behavior. The error is captured inexec_infofor debugging. The static analysis hint can be safely ignored in this context.
229-248: Core execution loop is well-structured.The
_runmethod cleanly iterates through the execution chain, managing IO between slices and accumulating results. Error handling at line 246 appropriately terminates on slice failure with context.However, line 246 should use a more specific exception type for consistency with the earlier suggestion.
if not ok: - raise Exception(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}") + raise RuntimeError(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")
| description = "Distributed zkML Toolkit" | ||
| readme = "README.md" | ||
| requires-python = ">=3.8" | ||
| requires-python = "==3.12.*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for Python 3.12+ specific syntax/features
echo "=== Checking for Python 3.12+ specific features ==="
# Type parameters (PEP 695 - 3.12+)
echo "1. Type parameters syntax (type X = ...):"
rg "^\s*type\s+\w+\s*=" --type=py 2>/dev/null | head -5
# override decorator (3.12+)
echo "2. Override decorator:"
rg "from typing import.*override" --type=py 2>/dev/null
# PEP 688 - Buffer protocol (3.12+)
echo "3. TypedDict NotRequired/Required:"
rg "from typing.*\(NotRequired|Required\)" --type=py 2>/dev/null
# Check Python version in imports or config
echo "4. Explicit Python version mentions:"
rg "sys.version_info" --type=py 2>/dev/null | head -5
# Check if any 3.11+ features are present
echo "5. Match statement (3.10+):"
rg "^\s*match\s+" --type=py 2>/dev/null | head -3
# List main Python files to review
echo "6. Python source files in repo:"
fd --type=py --max-depth=2 'src/|dsperse/' 2>/dev/null | head -20Repository: inference-labs-inc/dsperse
Length of output: 415
🏁 Script executed:
#!/bin/bash
# Get repository structure
echo "=== Repository structure ==="
ls -la | head -20
echo -e "\n=== Checking pyproject.toml ==="
cat pyproject.toml
echo -e "\n=== Checking setup.py or setup.cfg if present ==="
ls -la setup.py setup.cfg 2>/dev/null
echo -e "\n=== Full directory tree (limited depth) ==="
find . -maxdepth 3 -type f -name "*.py" | head -30Repository: inference-labs-inc/dsperse
Length of output: 2163
🏁 Script executed:
#!/bin/bash
# Check torch 2.7.1 Python support
echo "=== Torch 2.7.1 Python version requirements ==="
rg "torch" requirements.txt | head -5
echo -e "\n=== Checking main.py for version-specific code ==="
cat dsperse/main.py
echo -e "\n=== Checking constants.py ==="
head -50 dsperse/src/constants.py
echo -e "\n=== Searching for any version guards or conditionals ==="
rg "version_info|sys.version|python_requires|3\.12|3\.11|3\.10" --type=py 2>/dev/nullRepository: inference-labs-inc/dsperse
Length of output: 4857
Overly restrictive Python version constraint.
The strict pin to ==3.12.* breaks compatibility with Python 3.10, 3.11, and future versions (3.13+) without justification. The codebase contains no 3.12-specific syntax or features. If Python 3.12 is truly required for production support, use >=3.12 instead to allow forward compatibility; otherwise, consider relaxing to >=3.10 or >=3.11 based on your actual minimum supported version.
🤖 Prompt for AI Agents
In pyproject.toml around line 10, the requires-python entry is overly
restrictive ("==3.12.*"); update it to a permissive constraint that matches your
supported minimums (for example ">=3.10" or ">=3.11") or, if you truly require
3.12, change it to ">=3.12" to allow future patch/minor versions; pick the
correct lower bound for your supported Python versions and replace the
exact-equals pin with the chosen >= range.
Review: DSperse/DSlice File Type IntegrationTested full pipeline with Works
Issues Found
|
shirin-shahabi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: DSperse/DSlice File Type Integration
Tested full pipeline with doom and net models through: slice -> convert (dirs/dslice/dsperse) -> compile -> run
Works
- Slicing to dirs (with absolute paths)
- All conversion directions (dirs <-> dslice <-> dsperse)
- Compile with direct .dsperse file path
- Run with direct .dsperse file path
Issues Found
-
CRITICAL:
onnx_utils.py:167- Slicing fails with relative paths due to
absolute vs relative path mismatch inrelative_to()call.
Fix: Useroot = Path(slices_root).resolve()at line 73. -
CRITICAL:
converter.py:107-122-detect_type()doesn't scan directories
for.dspersefiles, causing run/prove/verify to fail when given a model
directory containing a.dspersefile.CompilerUtils.is_sliced_model()
handles this case correctly but the logic isn't shared. -
MEDIUM - Inconsistent detection logic between
CompilerUtils,Converter,
andRunner. Consider centralizing inConverter.detect_type(). -
LOW - Python version constraint too restrictive (
==3.12.*).
Recommendation
Address issues #1 and #2 before merge as they break common use cases.
…e and multi-slice modes, streamline logic, enhance metadata handling, and remove deprecated options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
dsperse/src/verifier.py (1)
251-251: Remove extraneous f-string prefix.This was flagged in a past review and is also caught by static analysis (F541). The string has no placeholders.
- print(f"\nVerification completed!") + print("\nVerification completed!")dsperse/src/slice/utils/converter.py (1)
51-91: Missing explicit return/raise for unhandled type combinations.The
convert()function can implicitly returnNoneif an unexpected type combination is encountered. A past review suggested adding a fallback. This should be addressed to prevent silent failures.elif output_type == "dslice": # Convert dsperse -> dirs (without expanding slices) temp_dirs = Converter._dsperse_to_dirs(path_obj, None, expand_slices=False) # The dsperse already contains dslices, so just return the directory logger.info(f"Extracted dsperse to directory with dslice files: {temp_dirs}") if cleanup: try: path_obj.unlink() logger.info(f"Removed source dsperse file: {path_obj}") except Exception as e: logger.warning(f"Could not remove source dsperse file {path_obj}: {e}") return temp_dirs + + # Fallback for unhandled combinations + raise ValueError(f"Unsupported conversion: {current_type} -> {output_type}")dsperse/src/analyzers/runner_analyzer.py (2)
150-215: Guard_process_slices_per_sliceagainst missing paths (avoidos.path.join(..., None)and fragileparent_dir)This block can still raise exceptions on partially-populated metadata:
parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0]will break ifentry["path"]is missing or absolute, and it’s a brittle way to derive the slice directory name.os.path.join(parent_dir, files.get(...))will raiseTypeErrorwhen any of the EZKL file keys are missing orNone, which is plausible for pre-/failed compilation states.onnx_path = os.path.join(parent_dir, slice.get("relative_path"))assumesrelative_pathis always present and already normalized; it can double-prefix the slice dir or crash onNone.To make this resilient (and match the robustness of
_process_slices_model), please normalize ONNX and EZKL paths conditionally, only when the underlying metadata fields are present. For example:- for entry in slices_data_list: - meta_path = entry.get("slice_metadata") - parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0] - + for entry in slices_data_list: + meta_path = entry.get("slice_metadata") + # Derive the slice directory name defensively from the entry path. + entry_path = entry.get("path") or "" + parent_dir = os.path.normpath(entry_path).split(os.sep)[0] if entry_path else "" + if not meta_path: + logger.warning("Missing slice_metadata path for entry %s; skipping", entry) + continue + try: with open(meta_path, "r") as f: meta = json.load(f) except Exception as e: logger.warning(f"Failed to load slice metadata at {meta_path}: {e}") continue @@ - # Normalize ONNX path - onnx_path = os.path.join(parent_dir, slice.get("relative_path")) + # Normalize ONNX path + rel_from_meta = slice.get("relative_path") or slice.get("path") + onnx_path = os.path.join(parent_dir, rel_from_meta) if (parent_dir and rel_from_meta) else rel_from_meta @@ - comp = ((slice.get("compilation") or {}).get("ezkl") or {}) - files = (comp.get("files") or {}) - compiled_flag = bool(comp.get("compiled", False)) - - if files: - circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled")) - settings_path = os.path.join(parent_dir, files.get("settings")) - pk_path = os.path.join(parent_dir, files.get("pk_key")) - vk_path = os.path.join(parent_dir, files.get("vk_key")) - else: - circuit_path = None - settings_path = None - pk_path = None - vk_path = None + comp = ((slice.get("compilation") or {}).get("ezkl") or {}) + files = (comp.get("files") or {}) + compiled_flag = bool(comp.get("compiled", False)) + + compiled_rel = files.get("compiled_circuit") or files.get("compiled") + + def _join_optional(key: str) -> Optional[str]: + rel = files.get(key) + return os.path.join(parent_dir, rel) if (parent_dir and rel) else None + + circuit_path = os.path.join(parent_dir, compiled_rel) if (parent_dir and compiled_rel) else None + settings_path = _join_optional("settings") + pk_path = _join_optional("pk_key") + vk_path = _join_optional("vk_key")This keeps the analyzer tolerant of missing EZKL artifacts and incomplete slice metadata instead of crashing.
382-452: Normalizeslice_pathtoPath, use_normalize_to_dirs, and fix string-path crash & model‑root handling ingenerate_run_metadata/__main__There are two coupled correctness issues here:
- String path crash:
__main__currently calls:out = RunnerAnalyzer.generate_run_metadata(str(model_path))
generate_run_metadatathen passes this string directly into_has_model_metadata, which does(path / "metadata.json"). With astr, this raisesTypeError: unsupported operand type(s) for /: 'str' and 'str'.
- Model-root vs slices-dir mismatch:
_has_model_metadatareturnsTruewhen either<path>/metadata.jsonor<path>/slices/metadata.jsonexists. For a model root (.../doom) withslices/metadata.json, this isTrue, but_build_from_model_metadatais then called withslices_dir=pathandload_slices_metadatatries to open<model_root>/metadata.json, which fails even though<model_root>/slices/metadata.jsonexists. The unused_normalize_to_dirshelper is clearly intended to resolve this by normalizing to the slices directory first.To fix both issues and actually support
.dslice/.dsperseinputs as per the docstring, I recommend:
- Accepting
str | Pathforslice_path.- Normalizing to a
Pathimmediately.- Using
_normalize_to_dirsto compute the effectiveslices_dirand detected packaging format.- Using the normalized
slices_dirfor_has_model_metadata/_build_from_*.- Recording
source_pathas the resolved original input.- Updating
__main__to pass aPath(though with the normalization in place, strings are fine too).Example patch:
- @staticmethod - def generate_run_metadata(slice_path: Path, save_path=None, original_format=None): + @staticmethod + def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None): @@ - if RunnerAnalyzer._has_model_metadata(slice_path): - run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path) - else: - run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path) + # Normalize the input into a slices directory and detected format + path_obj = Path(slice_path) + slices_dir, detected_format = RunnerAnalyzer._normalize_to_dirs(path_obj) + + # If caller didn't specify original_format, use what we detected + if original_format is None: + original_format = detected_format + + # Build run metadata from either model-level or per-slice metadata + if RunnerAnalyzer._has_model_metadata(slices_dir): + run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir) + else: + run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir) @@ - # Attach packaging metadata - run_meta["packaging_type"] = original_format - run_meta["source_path"] = str(slice_path) + # Attach packaging metadata + run_meta["packaging_type"] = original_format + run_meta["source_path"] = str(path_obj.resolve()) @@ - # Save - if save_path is None: - base = Path(slice_path).resolve() - # If slice_path is a model root or slices dir, put run/metadata.json next to it - base_dir = base if base.is_dir() else base.parent - save_path = base_dir / "run" / "metadata.json" - else: - save_path = Path(save_path).resolve() + # Save + if save_path is None: + base = path_obj.resolve() + # If slice_path is a model root, slices dir, or file (dslice/dsperse), put run/metadata.json next to it + base_dir = base if base.is_dir() else base.parent + save_path = base_dir / "run" / "metadata.json" + else: + save_path = Path(save_path).resolve() @@ - model_dir = base_paths[model_choice] + model_dir = base_paths[model_choice] @@ - out = RunnerAnalyzer.generate_run_metadata(str(model_path)) + out = RunnerAnalyzer.generate_run_metadata(model_path)This:
- Eliminates the string-path crash.
- Ensures model roots with
slices/metadata.jsonare normalized to the actual slices directory before reading metadata.- Makes
generate_run_metadataconsistent with the PR’s dslice/dsperse integration story by actually using_normalize_to_dirs.Also applies to: 454-487, 489-500
🧹 Nitpick comments (16)
dsperse/src/cli/prove.py (2)
14-14: Unused import:Utils
Utilsis imported but never used in this file.-from dsperse.src.utils.utils import Utils
134-135: Remove emptyfinallyblock.The
finally: passblock serves no purpose and can be removed.except Exception as e: print(f"{Fore.RED}Error proving run: {e}{Style.RESET_ALL}") traceback.print_exc() - finally: - passdsperse/src/cli/verify.py (3)
14-14: Unused import:Utils
Utilsis imported but never used in this file.-from dsperse.src.utils.utils import Utils
136-137: Remove emptyfinallyblock.Same issue as in
prove.py— thefinally: passblock is unnecessary.except Exception as e: print(f"{Fore.RED}Error verifying run: {e}{Style.RESET_ALL}") traceback.print_exc() - finally: - pass
36-58: Duplicatedget_all_runsfunction.This function is identical to the one in
prove.py. As noted in a past comment, there's significant code sharing between these CLI modules. Consider extracting shared utilities to a common module to reduce duplication.dsperse/src/verifier.py (2)
42-49: Silent exception swallowing can hide bugs.The
try-except-passblock silently ignores any errors when building theproof_paths_by_slicemap. Consider logging the exception to aid debugging.try: for entry in run_results.get("execution_chain", {}).get("execution_results", []): sid = entry.get("slice_id") or entry.get("segment_id") pe = entry.get("proof_execution", {}) if isinstance(entry, dict) else {} if sid and pe and pe.get("proof_file"): proof_paths_by_slice[sid] = pe.get("proof_file") except Exception: - pass + logger.debug("Failed to extract proof paths from run_results", exc_info=True)
156-160: Remove redundant import.
Utilsis already imported at the module level (line 12). This local import is unnecessary.# Convert back to the original packaging if needed if detected != "dirs": - from dsperse.src.utils.utils import Utils as _Utils - root = _Utils.dirs_root_from(Path(dirs_model_path)) + root = Utils.dirs_root_from(Path(dirs_model_path)) Converter.convert(str(root), output_type=detected, cleanup=True)dsperse/src/cli/slice.py (1)
78-89: Minor: Remove redundant import.
Pathis already imported at the module level (line 7). The local import as_Pis unnecessary.if not getattr(args, 'to_type', None): valid = {'dirs', 'dslice', 'dsperse'} - from pathlib import Path as _P while True: chosen_raw = prompt_for_value('to', 'Enter desired output type (dirs, dslice, dsperse)') # Sanitize in case prompt_for_value normalized it like a path - chosen = _P(str(chosen_raw)).name.strip().lower() + chosen = Path(str(chosen_raw)).name.strip().lower() if chosen in valid: args.to_type = chosen break print(f"{Fore.YELLOW}Invalid choice: '{chosen}'. Please enter one of: dirs, dslice, dsperse{Style.RESET_ALL}")dsperse/src/run/runner.py (2)
73-84: Duplicate_resolve_rel_pathfunctions with divergent logic.There are two nested
_resolve_rel_pathfunctions:
- Lines 73-84 in
run_onnx_slice: strips matching basename prefix and resolves- Lines 106-110 in
_run_ezkl_slice: tries base_dir first, then parentThese have different behavior and could cause inconsistent path resolution. Consider extracting a single utility function with unified logic.
Also applies to: 106-110
24-24: Use explicit| Nonefor optional parameters.Static analysis flags implicit
Optionaltypes. For clarity and PEP 484 compliance:- def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None): + def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None):- def run(self, input_json_path, slice_path: str, output_path: str = None) -> dict: + def run(self, input_json_path, slice_path: str, output_path: str | None = None) -> dict:Also applies to: 38-38
dsperse/src/slice/utils/converter.py (1)
349-349: Rename unused loop variable.The
dirsvariable fromos.walkis not used. Prefix with underscore to indicate intentional non-use.- for root, dirs, files in os.walk(source_dir): + for root, _dirs, files in os.walk(source_dir):dsperse/src/prover.py (2)
39-40: Redundant EZKL instantiation.
prove_dirscreates a newEZKL()instance, but the class already hasself.ezkl_runnerinitialized in__init__. Use the existing instance for consistency.- ezkl = EZKL() proofs: dict[str, dict] = {}And update the prove call at line 75:
- success, result = ezkl.prove( + success, result = self.ezkl_runner.prove(
290-290: Remove extraneous f-string prefix.Static analysis (F541) correctly flags this string has no placeholders.
- print(f"\nProving completed!") + print("\nProving completed!")dsperse/src/analyzers/runner_analyzer.py (3)
42-50: Preferlogging.exceptionand narrower exception types inload_slices_metadataCatching a blanket
Exceptionand logging withlogger.errormakes debugging harder and can hide the original traceback.Consider switching to
logger.exception(which preserves the stack) and, if feasible, narrowing theexcepttoOSError | json.JSONDecodeError:- try: - with open(Path(slices_dir) / 'metadata.json', 'r') as f: - return json.load(f) - except Exception as e: - logger.error(f"Failed to load slices metadata from {slices_dir}: {e}") - raise + try: + with open(Path(slices_dir) / "metadata.json", "r") as f: + return json.load(f) + except Exception: + logger.exception("Failed to load slices metadata from %s", slices_dir) + raise
53-81:process_slicesdispatch looks robust; consider trimming unused parameters downstreamThe new dispatcher correctly:
- Normalizes
slices_datato a list.- Uses explicit keys (
slice_metadata/slice_metadata_path) instead of structural length checks.- Routes cleanly between
_process_slices_per_sliceand_process_slices_model.However, both
_process_slices_model(slices_dir, ...)and_process_slices_per_slice(slices_dir, ...)no longer useslices_dir. If this argument truly isn’t needed, you can drop it from their signatures and the calls here to quiet static-analysis noise and simplify the API.
304-379:_normalize_to_dirsis useful but slightly misleading as written (docstring + unused local + wiring)This helper looks like the right central place for dslice/dsperse detection and normalization, but there are a few issues:
- The docstring advertises
(model_root, slices_dir, original_format), while the function actually returns(slices_dir, original_format).- The
model_rootvariable at Line 365 is assigned but never used, and theif ... elsethere is redundant.- Within this file,
_normalize_to_dirsis not (yet) used bygenerate_run_metadata, so detection logic is still split between this helper and_has_model_metadata/_build_from_*.To tighten this up:
- Align the signature and docstring with actual behavior and accept
Pathinputs:- @staticmethod - def _normalize_to_dirs(slice_path: str): - """Normalize input to (model_root, slices_dir, original_format). + @staticmethod + def _normalize_to_dirs(slice_path: str | Path): + """Normalize input to (slices_dir, original_format). @@ - path_obj = Path(slice_path) + path_obj = Path(slice_path)
- Remove the unused
model_rootand redundant conditional:- if detected == 'dirs': - sdir = path_obj - # If this looks like a slices folder (has metadata.json), parent is model root - model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent - return sdir.resolve(), 'dirs' + if detected == "dirs": + sdir = path_obj + return sdir.resolve(), "dirs"
- Wire
_normalize_to_dirsintogenerate_run_metadata(see separate comment) so this normalization is actually used for.dslice/.dsperseand “model root vs slices dir” inputs, aligning with the PR goal of consistent detection logic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
dsperse/src/analyzers/runner_analyzer.py(3 hunks)dsperse/src/cli/prove.py(5 hunks)dsperse/src/cli/run.py(4 hunks)dsperse/src/cli/slice.py(4 hunks)dsperse/src/cli/verify.py(4 hunks)dsperse/src/prover.py(3 hunks)dsperse/src/run/runner.py(1 hunks)dsperse/src/slice/utils/converter.py(1 hunks)dsperse/src/verifier.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dsperse/src/verifier.py (3)
dsperse/src/slice/utils/converter.py (3)
Converter(17-385)convert(20-90)detect_type(93-122)dsperse/src/analyzers/runner_analyzer.py (2)
RunnerAnalyzer(16-487)generate_run_metadata(454-487)dsperse/src/utils/utils.py (6)
Utils(12-359)load_run_metadata(174-180)load_run_results(241-247)merge_execution_into_run_results(257-313)save_run_results(250-254)update_metadata_after_execution(316-359)
dsperse/src/cli/run.py (2)
dsperse/src/run/runner.py (1)
run(38-63)dsperse/src/cli/base.py (3)
prompt_for_value(242-302)normalize_path(21-32)check_model_dir(157-174)
dsperse/src/run/runner.py (3)
dsperse/src/backends/ezkl.py (2)
EZKL(107-693)generate_witness(160-242)dsperse/src/backends/onnx_models.py (2)
OnnxModels(13-222)run_inference(18-60)dsperse/src/run/utils/runner_utils.py (1)
RunnerUtils(16-330)
dsperse/src/slice/utils/converter.py (2)
dsperse/src/cli/convert.py (1)
convert(44-98)install.sh (1)
info(57-57)
🪛 Ruff (0.14.7)
dsperse/src/verifier.py
48-49: try-except-pass detected, consider logging the exception
(S110)
48-48: Do not catch blind exception: Exception
(BLE001)
146-146: Avoid specifying long messages outside the exception class
(TRY003)
164-164: Avoid specifying long messages outside the exception class
(TRY003)
177-177: Avoid specifying long messages outside the exception class
(TRY003)
187-187: Avoid specifying long messages outside the exception class
(TRY003)
189-189: Avoid specifying long messages outside the exception class
(TRY003)
193-193: Avoid specifying long messages outside the exception class
(TRY003)
251-251: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/cli/slice.py
102-102: Do not catch blind exception: Exception
(BLE001)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
83-83: Unused static method argument: slices_dir
(ARG004)
122-122: Function definition does not bind loop variable slice_key
(B023)
150-150: Unused static method argument: slices_dir
(ARG004)
164-164: Do not catch blind exception: Exception
(BLE001)
330-330: Do not catch blind exception: Exception
(BLE001)
339-340: try-except-pass detected, consider logging the exception
(S110)
339-339: Do not catch blind exception: Exception
(BLE001)
350-350: Do not catch blind exception: Exception
(BLE001)
365-365: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
365-365: Useless if-else condition
(RUF034)
392-393: try-except-pass detected, consider logging the exception
(S110)
392-392: Do not catch blind exception: Exception
(BLE001)
424-424: Do not catch blind exception: Exception
(BLE001)
440-440: Create your own exception
(TRY002)
440-440: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/cli/verify.py
133-133: Do not catch blind exception: Exception
(BLE001)
dsperse/src/run/runner.py
24-24: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
24-24: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
38-38: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
45-45: Create your own exception
(TRY002)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
66-66: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
101-101: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
134-134: Do not catch blind exception: Exception
(BLE001)
181-181: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
248-248: Create your own exception
(TRY002)
248-248: Avoid specifying long messages outside the exception class
(TRY003)
dsperse/src/prover.py
155-155: Avoid specifying long messages outside the exception class
(TRY003)
174-174: Avoid specifying long messages outside the exception class
(TRY003)
191-191: Avoid specifying long messages outside the exception class
(TRY003)
205-205: Avoid specifying long messages outside the exception class
(TRY003)
207-207: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
290-290: f-string without any placeholders
Remove extraneous f prefix
(F541)
dsperse/src/slice/utils/converter.py
20-20: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
35-35: Avoid specifying long messages outside the exception class
(TRY003)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
76-76: Do not catch blind exception: Exception
(BLE001)
88-88: Do not catch blind exception: Exception
(BLE001)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
120-120: Avoid specifying long messages outside the exception class
(TRY003)
122-122: Avoid specifying long messages outside the exception class
(TRY003)
135-135: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
165-165: Do not catch blind exception: Exception
(BLE001)
170-170: Avoid specifying long messages outside the exception class
(TRY003)
195-195: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
267-267: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
290-290: Avoid specifying long messages outside the exception class
(TRY003)
309-309: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
316-316: Avoid specifying long messages outside the exception class
(TRY003)
344-344: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
349-349: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
440-440: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (11)
dsperse/src/cli/prove.py (1)
35-57: LGTM: Path normalization inget_all_runs.The function correctly normalizes the run root directory and ensures all returned paths are absolute, which addresses the relative vs absolute path issues mentioned in the PR comments.
dsperse/src/verifier.py (1)
73-78: LGTM: Proper validation before EZKL verification.The settings_path and vk_path are now properly validated before calling
ezkl_runner.verify(), addressing the critical bug identified in past reviews whereNonewould be converted to the string"None"and causeFileNotFoundError.dsperse/src/cli/run.py (2)
62-81: Path resolution logic is complex but handles multiple input scenarios correctly.The conditional logic correctly distinguishes between:
- File inputs (dsperse/dslice)
- Directories with metadata.json at root
- Directories with metadata.json in a
slicessubdirectory- Directories where Runner will auto-detect
This flexibility is necessary for the various input modes, but consider adding a brief inline comment summarizing the priority order for future maintainability.
133-135: LGTM: Runner instantiation and invocation.The Runner is correctly instantiated with optional
run_metadata_pathand invoked withinput_fileandslice_path. The fallbackslices_dir_effective or model_dirensures a valid path is always passed.dsperse/src/cli/slice.py (1)
59-105: LGTM:slice_convertsub-command implementation.The function properly handles interactive prompting, input validation, and delegates to
Converter.convert(). The error handling is appropriate for a CLI context.dsperse/src/run/runner.py (1)
231-264: LGTM: Execution chain traversal and result aggregation.The
_runmethod correctly:
- Traverses the execution chain via the linked-list pattern
- Prepares per-slice I/O files
- Accumulates slice results
- Computes final probabilities and prediction
- Persists results to
run_results.jsondsperse/src/slice/utils/converter.py (1)
92-122: LGTM:detect_typeimplementation.The function correctly identifies format types by:
- Checking file suffixes for
.dsperseand.dslice- Detecting
slice_*directories fordirsformat- Checking for single slice directories
- Checking for
.dslicefiles in a directoryThe error handling for unknown types is appropriate.
dsperse/src/prover.py (2)
56-64: LGTM: Proper artifact validation before proving.The code correctly validates required files (circuit, pk) and gracefully handles missing optional files (settings). This prevents cryptic errors from the EZKL backend.
132-174: LGTM: Unifiedproveentry point.The method correctly:
- Detects run mode (run-root vs single-slice)
- Detects packaging format via
Converter.detect_type- Routes to the appropriate proving method
- Handles format conversion for single-slice mode
- Provides clear error messages for invalid paths
dsperse/src/analyzers/runner_analyzer.py (2)
83-147: Model-level slice processing (_process_slices_model) is consistent and metadata‑drivenThis implementation cleanly:
- Normalizes ONNX and EZKL paths via
rel_from_payload+with_slice_prefix.- Avoids filesystem checks and trusts metadata, which matches the analyzer’s role.
- Handles missing shapes/dependencies/compilation info gracefully via
.get(..., default)andOptionalnormalization.No blocking issues spotted here.
218-233: Execution-chain and security computation are coherent and metadata‑drivenThe combination of
build_run_metadata,_build_execution_chain,_build_circuit_slices,get_execution_chain, and_calculate_securityis consistent:
- Slices are ordered by numeric index extracted from
slice_*keys.use_circuitis derived from the precomputedezklflag plus presence of circuit + keys; no filesystem checks.fallback_mapcorrectly maps eithercircuit_path -> onnx_pathorslice_key -> onnx_pathwhen only ONNX is available.- Security is defined as the percentage of slices with
ezkl=True, which is simple and transparent.No correctness problems spotted here.
Also applies to: 236-277, 279-288, 290-295, 297-302
| dirs_path = Utils.dirs_root_from(Path(dirs_path)) | ||
| metadata = Utils.load_run_metadata(run_path) | ||
|
|
||
| ezkl = EZKL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use self.ezkl_runner?
| model_path = Utils.resolve_under_slice(slice_dir, model_path) | ||
| pk_path = Utils.resolve_under_slice(slice_dir, pk_path) | ||
| settings_path = Utils.resolve_under_slice(slice_dir, settings_path) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere here I'd call self._prove_single_slice instead of engaging ezkl directly. Feels like we could avoid code repeating and call ezkl.prove in one place, that would make code maintenance a bit easier...
|
|
||
| # Convert back to the original packaging if needed | ||
| if detected != "dirs": | ||
| Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to convert back?
…slice directory references and updating slice processing logic. Rename process_slices method and streamline related functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
dsperse/src/analyzers/runner_analyzer.py (2)
154-219: Guard EZKL path joins against missing keys to avoidTypeErrorIn
_process_slices_per_slice, thefilesdict can be present but missing one or more of"compiled_circuit"/"compiled","settings","pk_key", or"vk_key"(e.g., pre-compilation, failed compilation, or partial metadata). In that case, calls like:settings_path = os.path.join(parent_dir, files.get("settings"))will do
os.path.join(..., None)and raiseTypeError, contradicting the “analyzer should just analyze what is there” goal.You can mirror the safer pattern from
_process_slices_model:- comp = ((slice.get("compilation") or {}).get("ezkl") or {}) - files = (comp.get("files") or {}) - compiled_flag = bool(comp.get("compiled", False)) - - if files: - circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled")) - settings_path = os.path.join(parent_dir, files.get("settings")) - pk_path = os.path.join(parent_dir, files.get("pk_key")) - vk_path = os.path.join(parent_dir, files.get("vk_key")) - else: - circuit_path = None - settings_path = None - pk_path = None - vk_path = None + comp = ((slice.get("compilation") or {}).get("ezkl") or {}) + files = (comp.get("files") or {}) + compiled_flag = bool(comp.get("compiled", False)) + + compiled_rel = files.get("compiled_circuit") or files.get("compiled") + + def _join_optional(key: str) -> Optional[str]: + rel = files.get(key) + return os.path.join(parent_dir, rel) if rel else None + + circuit_path = os.path.join(parent_dir, compiled_rel) if compiled_rel else None + settings_path = _join_optional("settings") + pk_path = _join_optional("pk_key") + vk_path = _join_optional("vk_key")This keeps behavior identical when all paths are present, but makes the analyzer resilient to incomplete compilation metadata.
458-492:generate_run_metadatadoesn’t normalizeslice_pathand will crash forstrinputs / model roots / dsperse pathsRight now:
- The signature annotates
slice_path: Path, but the__main__block passesstr(model_path).generate_run_metadataimmediately calls_has_model_metadata(slice_path), and_has_model_metadatausespath / "metadata.json"; with astr, this raisesTypeError.- Even with a
Pathfor a model root that only hasslices/metadata.json,_has_model_metadatawill beTrue(due to theslices/metadata.jsoncheck) but_build_from_model_metadata(slice_path)will then try to loadslice_path / "metadata.json", which likely doesn’t exist.Also, the docstring promises handling of
.dslice/.dsperse, but_normalize_to_dirsis never invoked here, so those inputs aren’t actually supported.You can fix all of this by:
- Accepting
str | Path.- Normalizing via
_normalize_to_dirsto a concrete slices directory and detected packaging format.- Using the normalized slices dir for
_has_model_metadata/ builders.- Recording the original path in
source_path.For example:
- @staticmethod - def generate_run_metadata(slice_path: Path, save_path=None, original_format=None): + @staticmethod + def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None): @@ - if RunnerAnalyzer._has_model_metadata(slice_path): - run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path) - else: - run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path) + # Normalize input to a slices directory and detect original packaging + path_obj = Path(slice_path) + slices_dir, detected_format = RunnerAnalyzer._normalize_to_dirs(str(path_obj)) + slices_dir = Path(slices_dir) + + if original_format is None: + original_format = detected_format + + if RunnerAnalyzer._has_model_metadata(slices_dir): + run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir) + else: + run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir) @@ - run_meta["packaging_type"] = original_format - run_meta["source_path"] = str(slice_path) + run_meta["packaging_type"] = original_format + run_meta["source_path"] = str(path_obj.resolve()) @@ - if save_path is None: - base = Path(slice_path).resolve() + if save_path is None: + base = path_obj.resolve() # If slice_path is a model root or slices dir, put run/metadata.json next to it base_dir = base if base.is_dir() else base.parent save_path = base_dir / "run" / "metadata.json"This makes the API robust for both
strandPathinputs, correctly handles model roots vs slices dirs, and actually enables the promised.dslice/.dspersesupport via_normalize_to_dirs.
🧹 Nitpick comments (8)
dsperse/src/analyzers/runner_analyzer.py (8)
9-21: UnusedSIZE_LIMITand no-op__init__can be simplified
SIZE_LIMITis currently unused in this file, andRunnerAnalyzeris effectively a static utility with a no-op__init__. Consider either wiringSIZE_LIMITinto circuit selection/security logic or removing it, and dropping__init__entirely to make the stateless pattern clearer.
42-50: Preferlogger.exceptionfor metadata load failuresThe error path in
load_slices_metadatalogs withlogger.errorthen re-raises. Usinglogger.exceptionwould preserve the traceback in logs and aligns with the Ruff hint, while keeping the same control flow.- except Exception as e: - logger.error(f"Failed to load slices metadata from {slices_dir}: {e}") - raise + except Exception: + logger.exception(f"Failed to load slices metadata from {slices_dir}") + raise
53-81:process_slicesdispatch looks robust; minor note on unusedslices_dirdownstreamThe new dispatcher avoids the old
len(...) == 3heuristic and correctly handles empty input and both per-slice/model-level entries. That’s a good hardening.Minor:
_process_slices_modeldoesn’t use itsslices_dirargument at all; if that’s intentional, you could either remove the parameter or add a brief comment to avoid future confusion.
83-151: Model-level slice processing is reasonable; confirmezkl/security semanticsThe metadata-driven approach here (no filesystem probing, relying on
compilation.ezkl.files) looks good.However,
ezklis justbool(compiled_flag), while_build_execution_chainadditionally requireshas_circuitandhas_keysforuse_circuit, and_build_circuit_sliceslater treatsezklas the indicator of which slices are “circuit slices” for security. Please confirm thatcompiledis only set when all required files/keys exist and any size limit has been enforced; otherwiseoverall_securitymay over-report coverage relative to whatuse_circuitactually does.
223-238:build_run_metadatacurrently bypassesprocess_slicesdispatcher
build_run_metadataunconditionally calls_process_slices_model, even though you introducedprocess_slicesas a more general dispatcher. Ifbuild_run_metadatais only ever invoked on model-levelslices/metadata.json, this is fine, but consider either:
- Switching to
process_slicesfor consistency, or- Documenting that
build_run_metadataexpects model-level metadata only.Right now the
#TODOsuggests some intent to revisit.
283-300: Align_build_circuit_sliceswith actualuse_circuitsemantics
_build_circuit_slicescurrently usesbool(slice_data.get("ezkl", False)), while_build_execution_chainfurther gates onhas_circuitandhas_keys. Ifezklis everTruewhile keys/artifacts are missing,circuit_slicesandoverall_securitywill diverge from what the runner actually uses.Consider either:
- Computing a single “ready_for_ezkl” flag in the slice map that already incorporates keys/size checks and using it in both places, or
- Deriving
circuit_slicesdirectly from the execution_chain nodes’use_circuit.
309-384:_normalize_to_dirsbehavior is useful; tighten docstring and clean up minor issuesThis helper nicely centralizes detection/normalization of dirs vs dslice vs dsperse and reuses
Converter.detect_type/convert. A couple of small points:
- The docstring says it returns
(model_root, slices_dir, original_format), but the implementation returns only(slices_dir, original_format). Either update the docstring or extend the return value so callers aren’t misled.- The
model_rootvariable at Line 370 is assigned but never used, and theif ... elseis effectively identical. You can drop that whole block or just computesdir.parentonce.- Several broad
except Exceptionblocks silently swallow errors (e.g., Lines 335, 344–345, 355). For diagnosability, consider at least logging those before continuing.These are non-blocking but would improve clarity and debuggability.
386-456: Metadata builders are generally solid; consider tightening exception handling
_build_from_model_metadataand_build_from_per_slice_dirsdo the right structural thing: read metadata, construct slice entries, then delegate to the existing chain/security builders.Two minor notes:
- The
try/except Exception: passaroundrun_meta["model_path"]is very broad for a low-risk operation; you could just assign without atry, or at least log on failure.- Raising a bare
Exceptionwith a long message at Line 445 works but makes it harder to distinguish this failure from others programmatically. A small custom exception type (e.g.,NoSlicesFoundError) or at least a more specific built‑in would make upstream handling clearer.None of this blocks functionality but would aid debugging and error classification.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dsperse/src/analyzers/runner_analyzer.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
save_metadata_file(18-42)dsperse/src/slice/utils/converter.py (3)
Converter(17-385)detect_type(93-122)convert(20-90)
🪛 Ruff (0.14.7)
dsperse/src/analyzers/runner_analyzer.py
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
83-83: Unused static method argument: slices_dir
(ARG004)
127-127: Function definition does not bind loop variable slice_dirname
(B023)
154-154: Unused static method argument: slices_dir
(ARG004)
168-168: Do not catch blind exception: Exception
(BLE001)
335-335: Do not catch blind exception: Exception
(BLE001)
344-345: try-except-pass detected, consider logging the exception
(S110)
344-344: Do not catch blind exception: Exception
(BLE001)
355-355: Do not catch blind exception: Exception
(BLE001)
370-370: Local variable model_root is assigned to but never used
Remove assignment to unused variable model_root
(F841)
370-370: Useless if-else condition
(RUF034)
397-398: try-except-pass detected, consider logging the exception
(S110)
397-397: Do not catch blind exception: Exception
(BLE001)
429-429: Do not catch blind exception: Exception
(BLE001)
445-445: Create your own exception
(TRY002)
445-445: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
dsperse/src/analyzers/runner_analyzer.py (2)
241-281: Execution-chain construction looks correct; check behavior for non-numeric slice IDsThe chain logic (ordered by numeric suffix,
nextpointers, andfallback_mapkeyed by circuit path or slice ID) is coherent and avoids the earlier off‑by‑one concerns.One edge case:
int(str(k).split('_')[-1])will raise if any slice keys don’t end with an integer. If that can’t happen in your metadata, no change is needed; otherwise, you may want a defensive fallback (e.g., skip such slices or log and continue).
494-505: Update__main__demo to passPathobject directly togenerate_run_metadata()The
__main__block currently converts thePathto a string before passing it togenerate_run_metadata(). If the method signature has been updated to acceptstr | Pathwith internal normalization, remove thestr()conversion:- out = RunnerAnalyzer.generate_run_metadata(str(model_path)) + out = RunnerAnalyzer.generate_run_metadata(model_path)This ensures the demo exercises the same code path that real callers will use.
Add dsperse and dslice files.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.