Skip to content

Conversation

@0pendansor
Copy link
Collaborator

@0pendansor 0pendansor commented Oct 21, 2025

Add dsperse and dslice files.

Summary by CodeRabbit

  • New Features
    • Add convert CLI for switching between archive and directory slice layouts; new compile orchestrator and unified compile/run/prove/verify flows.
  • Improvements
    • Branding standardized to "DSperse" and terminology changed from “segment” to “slice”.
    • Slice outputs now use slice_/payload/... layout with richer slice-level metadata, clearer run/proof/compile reports, and improved path normalization.
  • Chores
    • Expanded .gitignore for common model inputs/artifacts; added a dsperse console entry point.

✏️ Tip: You can customize this high-level summary in your review settings.

…ection; Added exclude_patterns to Converter for pk exclusion during packing
Copy link
Member

@HudsonGraeme HudsonGraeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Left some small comments, questions and thoughts here 🙏

The main points:

  • In the subnet we can't share DSperse / DSlice files back and forth at runtime as it'd be very inefficient / incur a huge amount of data transfer so I think ideally we'd transmit minimal witness inputs and proof data back and forth
  • Overall if we could improve pathing logic in the future to have users indicate what kinds of files they've provided and where those are located then fail early if any are missing then it'd save us tons of effort in terms of searching files and path verification.
  • Since we're supporting JSTprove (this is probably out of scope for the PR) we should be cognizant of areas where we're hardcoding ezkl related paths

slices_dir = self._converted_dir
model_dir = slices_dir.parent

# Detect single-slice mode: a directory that itself is a slice (has metadata.json + payload) and no slice_* children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can have the user specify what they're inputting or structure out subcommands in such a way that inherently specifies the incoming file, we could remove all logic that attempts to detect the format or search paths

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Refactors "segment" → "slice", changes slice payload layout to per-slice payload directories, adds Converter for packaging conversions, introduces Compiler + CompilerUtils and a new Runner + RunnerUtils for compile/run orchestration, updates analyzers/metadata and many CLI commands, and removes legacy monolithic modules. (50 words)

Changes

Cohort / File(s) Summary
Repo & Branding
/.gitignore, README.md, docs/arc42.md, docs/overview.md, dsperse/src/cli/__init__.py, dsperse/src/cli/base.py
Added model-related ignore patterns; normalized branding to "DSperse"; documentation and README updated to reflect slice naming, new payload layout, output conventions, CLI examples, and new flags.
CLI Surface
dsperse/src/cli/compile.py, dsperse/src/cli/convert.py, dsperse/src/cli/run.py, dsperse/src/cli/prove.py, dsperse/src/cli/verify.py, dsperse/src/cli/slice.py, dsperse/main.py
Added convert command and slice convert subcommand; moved many parsers to path/positional semantics; added lazy dispatch in main slice command; adjusted prompts, validation and runtime flows to accept dirs/.dslice/.dsperse and explicit --slices/--path flags.
Slice tooling & layout
dsperse/src/slice/slicer.py, dsperse/src/slice/onnx_slicer.py, dsperse/src/slice/utils/onnx_utils.py, dsperse/src/slice/utils/converter.py
Slicer now emits slice_{i}/payload/slice_{i}.onnx layout; OnnxUtils.write_slice_dirs_metadata normalizes slice dirs and writes per-slice metadata; Converter added for conversions between dirs, .dslice, and .dsperse.
Analyzers & metadata
dsperse/src/analyzers/onnx_analyzer.py, dsperse/src/analyzers/runner_analyzer.py
Analyzer keys renamed (segmentslice, segmentsslices); OnnxAnalyzer writes slice dirs metadata; RunnerAnalyzer refactored into a stateless API (generate_run_metadata, process_slices) with multi-format normalization and new execution_chain/circuit_slices structures.
Compile subsystem
dsperse/src/compile/compiler.py, dsperse/src/compile/utils/compiler_utils.py, dsperse/src/backends/ezkl.py
New Compiler class and CompilerUtils for per-slice and whole-model compilation orchestration; EZKL backend pipeline renamed to compilation_pipeline, get_version() added, artifact keys standardized, and error fields renamed to the new schema.
Run & runtime utilities
dsperse/src/run/runner.py, dsperse/src/run/utils/runner_utils.py, dsperse/src/utils/utils.py
New Runner class for hybrid EZKL/ONNX execution and RunnerUtils runtime helpers (normalization, run dir creation, per-slice IO, execution routing); Utils extended with metadata discovery, run results load/save, path helpers and execution-merge helpers.
Prover & Verifier
dsperse/src/prover.py, dsperse/src/verifier.py
Prover/Verifier reworked to operate over dirs/dslice/dsperse inputs; added per-slice prove_/verify_ methods, Converter and Utils integration, and run_results updates.
Backend import adjustments
dsperse/src/backends/onnx_models.py, dsperse/src/backends/pth_models.py
Updated internal import paths to dsperse.src.run.utils.runner_utils.
Removed / Replaced modules
dsperse/src/compiler.py, dsperse/src/runner.py, dsperse/src/utils/runner_utils/runner_utils.py
Legacy monolithic Compiler, Runner, and old RunnerUtils removed; functionality split into new dsperse/src/compile/*, dsperse/src/run/*, and dsperse/src/slice/* modules.
Packaging & deps
requirements.txt, pyproject.toml, docs/uv_packaging.md
Requirements pinned/expanded (torch bumped to 2.7.1), pyproject updated (Python -> 3.12.*, new dsperse script entry), and uv packaging guide added.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI (convert / slice / compile / run / prove / verify)
    participant Conv as Converter
    participant RA as RunnerAnalyzer
    participant Compiler as Compiler
    participant Runner as Runner
    participant Prover as Prover
    participant EZKL as EZKL Backend
    participant ONNX as ONNX Models

    User->>CLI: invoke command (convert / slice / compile / run / prove / verify)
    CLI->>Conv: detect/convert (dsperse/.dslice ↔ dirs)
    Conv-->>CLI: slices_dir

    CLI->>RA: generate_run_metadata(slices_dir)
    RA-->>CLI: run_metadata

    alt compile flow
        CLI->>Compiler: compile(slice_path, input, layers)
        Compiler->>EZKL: compilation_pipeline (per-slice)
        EZKL-->>Compiler: compilation_data
        Compiler->>RA: update slice metadata
        Compiler-->>CLI: compilation_dir
    end

    alt run / prove flow
        CLI->>Runner: run(run_metadata, input)
        Runner->>RA: get_execution_chain(run_metadata)
        loop per-slice
            Runner->>EZKL: generate_witness / prove
            EZKL-->>Runner: witness/proof or error
            alt ezkl fails
                Runner->>ONNX: run_onnx_slice (fallback)
                ONNX-->>Runner: output
            end
        end
        Runner-->>CLI: run_results.json
        CLI->>Prover: (optional) prove using run metadata
        Prover->>EZKL: verify/prove per-slice proofs
        Prover-->>CLI: proof results
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

  • Areas needing careful review:
    • RunnerAnalyzer redesign and cross-format normalization.
    • Compiler and CompilerUtils: per-slice compilation, calibration inference chain, and metadata persistence.
    • Runner and RunnerUtils: execution chain, EZKL ↔ ONNX fallback behavior, relative payload resolution, and run-results merging.
    • Converter: archive packing/unpacking, expand/cleanup semantics, and edge cases (nested archives, name collisions).
    • Cross-module import moves and removals: ensure no unresolved imports or circular references and that the CLI entrypoint works.
    • Consistency of metadata key renames (segmentsslices) across all read/write code paths.

Possibly related PRs

  • Subnet-2 integration fixes #39 — closely related refactor touching Runner API and packaging/docs; likely overlaps with the Runner/packaging changes in this PR.

Poem

🐰 I hop through slices, tidy payloads in tow,
DSperse packs and unpacks each file in a row.
Compiler hums, Runner races, proofs march on cue,
Converter bundles carrots — all labeled anew!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'dslice and dsperse file type integration' clearly describes the main objective of the pull request, which is to add and integrate dsperse and dslice file format support throughout the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dan/dslice

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dsperse/src/analyzers/onnx_analyzer.py (1)

337-390: Fix slices structure and guard output_dir to prevent runtime failures in prover/verifier and persistence flows

generate_slices_metadata introduces two breaking issues:

  1. slices must be a dict, not a list
    iter_circuit_slices (utils.py:197–198) calls slices.items(), and prover.py:43 / verifier.py:53 rely on it to iterate (slice_id, meta) pairs. The current code assigns a list to model_overview["slices"], which will crash both prover and verifier with AttributeError: 'list' object has no attribute 'items'. Instead, build a dict keyed by slice_id (e.g., {"slice_0": {...}, "slice_1": {...}}).

  2. output_dir cannot be None at lines 387–388
    The function signature allows output_dir=None, but Utils.save_metadata_file(output_path=output_dir) calls Path(output_path) at line 27, which fails with TypeError on None. Similarly, OnnxUtils.write_slice_dirs_metadata(output_dir) calls Path(slices_root) at line 69, also failing on None. Add a guard: either require output_dir (raise if missing) or default it to os.path.join(os.path.dirname(self.onnx_path), "slices") before persisting.

dsperse/src/prover.py (1)

168-177: main: missing early exit when run directory is absent

In the demo __main__ block, if run_dir does not exist you print a warning but still proceed to:

run_dirs = sorted([d for d in os.listdir(run_dir) if d.startswith("run_")])

This will raise FileNotFoundError on os.listdir(run_dir) instead of cleanly exiting.

If this script is used for local testing, it’s better to return/exit early:

    run_dir = os.path.join(model_dir, "run")
    if not os.path.exists(run_dir):
-        print(f"Run directory not found at {run_dir}, assuming input file provided.")
+        print(f"Run directory not found at {run_dir}")
+        exit(1)
♻️ Duplicate comments (6)
dsperse/src/cli/convert.py (2)

64-73: Avoid re-importing Path alias in the prompt loop

You already import Path at the top; using Path directly here avoids the extra _P alias and keeps things simpler.

-        from pathlib import Path as _P
         while True:
             chosen_raw = prompt_for_value('to', 'Enter desired output type (dirs, dslice, dsperse)')
-            chosen = _P(str(chosen_raw)).name.strip().lower()
+            chosen = Path(str(chosen_raw)).name.strip().lower()

[ suggest_optional_refactor ]


75-85: Unify dsperse→dirs handling with Converter.convert and avoid duplicated cleanup

This branch re-implements the dsperse -> dirs path (including manual cleanup) instead of delegating to Converter.convert, which already owns that behavior. It also means cleanup semantics live in two places.

Consider extending Converter.convert to accept an optional expand_slices flag and delegate from here, so all dsperse→dirs logic and cleanup live in one place:

-        if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs':
-            result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False)))
-            if bool(getattr(args, 'cleanup', True)):
-                try:
-                    p.unlink()
-                except Exception:
-                    pass
-        else:
-            result = Converter.convert(input_path, output_type=getattr(args, 'to_type', None), output_path=output_path, cleanup=bool(getattr(args, 'cleanup', True)))
+        if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs':
+            result = Converter.convert(
+                str(p),
+                output_type='dirs',
+                output_path=output_path,
+                cleanup=bool(getattr(args, 'cleanup', True)),
+                # new optional param:
+                expand_slices=bool(getattr(args, 'expand_slices', False)),
+            )
+        else:
+            result = Converter.convert(
+                input_path,
+                output_type=getattr(args, 'to_type', None),
+                output_path=output_path,
+                cleanup=bool(getattr(args, 'cleanup', True)),
+            )

[ suggest_recommended_refactor ]

dsperse/src/cli/verify.py (1)

31-144: Nice fast-path API; consider factoring shared run/metadata logic with run CLI

The new positional-arg fast path (run_path / data_path) is a good improvement, and the metadata‑driven fallback from source_path / model_path is clear.

There is, however, quite a bit of overlap with run’s CLI around:

  • Path normalization and prompting,
  • Resolving model/slices roots,
  • Handling output file saving and result summaries.

Given this was already flagged before, it may be worth extracting common helpers in cli.base (or a small shared module) to avoid divergence between run and verify over time (e.g., default paths, summary formatting).

[ suggest_recommended_refactor ]

dsperse/src/cli/prove.py (2)

169-347: Archive unpacking + run-dir discovery is more complex than needed and duplicates Converter work

This block:

  • Unpacks .dsperse/.dslice just to discover a run directory,
  • Walks the unpacked tree (os.walk) and manages multiple run_candidates,
  • Then later passes the original archive path into Prover.prove, which will convert/unpack it again via Converter.

Given dsperse bundles do not contain run folders themselves, you can usually infer the run root from the archive’s parent (e.g., slices.dsperseparent/run) and let Prover/Converter own archive handling. Consider simplifying to:

  • Derive parent_run = Path(candidate).with_suffix('').parent / "run" (or similar),
  • Check existence once instead of walking,
  • Avoid the extra Converter.convert call and unpacked staging dir for run discovery.

You can also drop the inner from dsperse.src.slice.utils.converter import Converter (top-level import already present) and the unused cleanup_unpacked flag, and rename the unused files loop variable to _files to keep Ruff happy.


431-463: Repacking proved .dsperse bundles may be heavy; consider making it opt-in

This block always repacks an archive into <stem>_proved.dsperse after proving, even though:

  • The verifier only needs proofs and a vk/settings, not the entire model bundle.
  • Converting dirs → dsperse is already handled generically by Converter, and doing it here adds another large artifact and I/O step.

Given earlier feedback about dsperse repacking cost and filename-driven behavior, consider:

  • Guarding repack behind a CLI flag (e.g., --repack-dsperse),
  • Or moving bundle creation into a separate explicit command.

That keeps prove focused on proofs and avoids surprising large bundle writes for users who only care about local verification.

dsperse/src/cli/slice.py (1)

42-57: slice convert subcommand duplicates top-level convert behavior

The new slice convert subparser largely re-implements the logic already present in dsperse/src/cli/convert.py (prompting for input, to_type, handling dsperse→dirs, etc.). This works, but increases the risk of the two diverging over time. It might be cleaner to route both entry points through a single shared helper (e.g., call convert.convert(args) from here) and layer only the extra --exclude-pk semantics on top.

🧹 Nitpick comments (32)
dsperse/src/slice/utils/onnx_utils.py (1)

104-188: Confirm that resetting slice index to enumeration order is intended

In write_slice_dirs_metadata, you derive idx from enumerate(slices) and then:

  • Rename directories to slice_{idx}.
  • Set single_slice_entry["index"] = idx.

If upstream metadata already carried a meaningful index (e.g., graph node index or original slice id), this will overwrite it and may break consumers that expect stable IDs.

If the intent is to normalize everything to 0‑based sequential slice indices, consider:

  • Either documenting this explicitly in a docstring/comment.
  • Or preserving the original index in a separate field (e.g., original_index) before overwriting:
-            single_slice_entry = dict(seg)
-            single_slice_entry["index"] = idx
+            single_slice_entry = dict(seg)
+            if "index" in single_slice_entry and single_slice_entry["index"] != idx:
+                single_slice_entry["original_index"] = single_slice_entry["index"]
+            single_slice_entry["index"] = idx

[ request_verification ]

README.md (2)

14-18: Terminology: “segments” vs “slices”

The feature bullets still talk about “segments” and “segment-by-segment” execution, but the rest of the codebase and README have moved to “slice” terminology. For consistency and to reduce confusion for new users, consider updating these remaining references to “slice(s)” where appropriate.


91-96: Minor markdown list indentation nit

The nested bullet list under “The script will:” has extra indentation relative to the - Unix/macOS: parent, which triggers MD007 (ul-indent). Reducing the child bullets to the expected indentation width would keep markdownlint happy without changing the rendered content.

dsperse/src/verifier.py (2)

39-48: Swallowed exceptions when reading execution_results make debugging harder

The try/except Exception: pass around building proof_paths_by_slice silently hides any structural issues in run_results. If the schema ever changes, you’ll just get “Skipping slice: proof path not recorded” with no clue why.

Consider at least logging at debug or warning level in the except block (e.g., logger.exception("Failed to build proof path map from run_results")) so unexpected formats are visible during troubleshooting.


138-182: Minor: __main__ harness f-strings and structure

The test harness at the bottom is helpful, but:

  • print(f"\nVerification completed!") doesn’t interpolate anything and triggers Ruff F541; it can be a regular string.
  • Similar for other constant f-strings here.

These are minor, but trimming the unnecessary f prefixes will keep tooling quiet without changing behavior.

dsperse/src/compile/compiler.py (4)

41-56: Slice path resolution and error signaling look correct (minor style nits only)

The _compile_slice logic correctly prefers an absolute path and falls back to a relative_path rooted at base_path, raising FileNotFoundError when neither resolves. That’s the right behavior for missing slice artifacts.

If you care about Ruff’s TRY003/F541, you could drop the f prefixes for constant messages and potentially factor the repeated “No valid path found for slice” string into the exception type, but behavior-wise this is fine.


95-141: Sliced-model compilation flow is coherent; consider slight counter/metadata tweaks

The _compile_slices method:

  • Loads slices metadata via Utils.find_metadata_path.
  • Optionally runs CompilerUtils.run_onnx_inference_chain for calibration.
  • Iterates slices_data with optional layer_indices filtering.
  • Calls _compile_slice, then attaches a compilation.ezkl block per slice and persists the updated metadata.

This matches the new EZKL circuitization pipeline and metadata shape used elsewhere. Two small refinements you might consider:

  • Only increment compiled_count when success is truthy, so the log line reflects successfully compiled slices.
  • If Utils.iter_circuit_slices (used by Prover) expects a particular structure for metadata["slices"] (e.g., mapping keyed by slice_id vs list-of-dicts), double-check that the updated metadata format here stays in sync with that contract to avoid surprises later in the proving pipeline.

Otherwise the orchestration looks solid.


143-182: Top-level compile routing and Converter usage make sense

The public compile() correctly:

  • Validates model_path exists,
  • Parses layers via CompilerUtils.parse_layers,
  • Uses CompilerUtils.is_sliced_model to branch between sliced and raw .onnx flows,
  • Converts non-dirs sliced inputs to dirs, compiles, and converts back to the original type.

This provides a clean, single entry point for both sliced and unsliced models. Only minor nits:

  • The variable name type shadows the builtin; renaming (e.g., to slice_type) would improve readability.
  • When converting to/from dirs with cleanup=True, be aware of which artifacts are removed vs preserved; ensure this matches user expectations for dsperse/dslice lifetimes.

Behaviorally, the routing looks correct.


184-202: Minor: __main__ harness f-string and reuse

The test harness is useful, but:

  • print(f"Compilation finished.") is a constant and triggers Ruff F541; it can be a plain string.
  • If you plan to keep this as a developer-only entry point, consider a small helper function (e.g., _demo()) and a guard to avoid accidental use in production contexts.

Purely cosmetic; no impact on library usage.

dsperse/src/cli/compile.py (3)

74-85: Consider normalizing slice indices when validating layers

available_indices is taken directly from metadata['slices'][*]['index'] and compared against parsed int indices. If metadata ever stores indices as strings or omits them for some slices, valid layers could be incorrectly treated as missing. You could defensively normalize and filter, e.g., only keep non-None indices and cast numeric strings to int before comparison.


132-167: Path heuristics are reasonable; minor opportunity to simplify path handling

The heuristic around output directories and adjacent metadata.json/slices paths is sensible and should reduce user confusion. Right now you maintain both target_path_obj (pre-heuristic) and tp_obj (post-heuristic) and only use the former for the directory-existence gate. You could slightly simplify and avoid potential drift by recreating Path(target_path) after the heuristics and using that consistently for the is_dir check and check_model_dir gate.


173-177: Remove unnecessary f-string in logger call

logger.info(f"Compiler initialized successfully") doesn’t interpolate anything, which Ruff flags (F541). You can drop the f prefix:

-        logger.info(f"Compiler initialized successfully")
+        logger.info("Compiler initialized successfully")
dsperse/src/backends/ezkl.py (1)

578-613: Clean up unused f-strings and stale docstring in compilation_pipeline

Within compilation_pipeline, several f-strings don’t interpolate anything, and the docstring still refers to “circuitization”:

settings_path = os.path.join(output_path, f"settings.json")
compiled_path = os.path.join(output_path, f"model.compiled")
vk_path = os.path.join(output_path, f"vk.key")
pk_path = os.path.join(output_path, f"pk.key")
...
"""Run the full EZKL circuitization pipeline ..."""

You can:

  • Drop the f prefixes:
-        settings_path = os.path.join(output_path, f"settings.json")
-        compiled_path = os.path.join(output_path, f"model.compiled")
-        vk_path = os.path.join(output_path, f"vk.key")
-        pk_path = os.path.join(output_path, f"pk.key")
+        settings_path = os.path.join(output_path, "settings.json")
+        compiled_path = os.path.join(output_path, "model.compiled")
+        vk_path = os.path.join(output_path, "vk.key")
+        pk_path = os.path.join(output_path, "pk.key")
  • Optionally update the docstring and error message around line 667 to say “compilation pipeline” instead of “circuitization” for consistency.
dsperse/src/run/runner.py (2)

22-45: Initialization flow is good; consider more specific exception type and message

The constructor correctly normalizes slice_path, generates run metadata when needed, and initializes EZKL. The only nit is:

if not slice_path or not Path(slice_path).exists():
    raise Exception("A path of a dslice file or dsperse file is required.")

Given you also accept a slices directory here, a more accurate and conventional choice would be a ValueError (or a small custom RunnerInitializationError) with wording that includes directories as valid inputs.


123-188: Log EZKL exceptions in _run_ezkl_slice for easier debugging

Catching all exceptions around generate_witness lets you cleanly fall back to ONNX, but right now the exception is only turned into a string in exec_info['error']:

try:
    success, output_tensor = self.ezkl_runner.generate_witness(...)
except Exception as e:
    success = False
    output_tensor = str(e)
...
if not success:
    exec_info['error'] = output_tensor if isinstance(output_tensor, str) else "Unknown EZKL error"

Consider also logging the exception (e.g., logger.exception("EZKL generate_witness failed")) so that stack traces are available during debugging, while still returning structured error info for the caller.

dsperse/src/slice/utils/converter.py (1)

341-377: Make _zip_directory’s exclude handling work with glob patterns like *pk*

Right now _zip_directory treats patterns in a bespoke way (e.g., slice_* skipped via startswith, *ext treated as “endswith(ext)”), which doesn’t align with the CLI passing '*pk*' to exclude proving key files—'*pk*' will never match under the current endswith(pattern[1:]) logic.

Consider switching to standard glob matching using fnmatch.fnmatch on relative paths and filenames, e.g.:

+from fnmatch import fnmatch
@@
-        with zipfile.ZipFile(output_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
-            for root, dirs, files in os.walk(source_dir):
-                root_path = Path(root)
-
-                # Check if we should skip this directory
-                if exclude_patterns:
-                    rel_root = root_path.relative_to(source_dir)
-                    skip = False
-                    for pattern in exclude_patterns:
-                        if pattern.startswith('*'):
-                            # Match file extension patterns
-                            pass  # We'll check files individually
-                        elif str(rel_root).startswith(pattern.replace('*', '')):
-                            skip = True
-                            break
-                    if skip:
-                        continue
+        with zipfile.ZipFile(output_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
+            for root, _dirs, files in os.walk(source_dir):
+                root_path = Path(root)
+                rel_root = root_path.relative_to(source_dir)
+
+                # Skip entire directories that match any pattern
+                if exclude_patterns and any(fnmatch(str(rel_root), pat) for pat in exclude_patterns):
+                    continue
@@
-                    # Skip files matching exclude patterns
-                    if exclude_patterns:
-                        skip_file = False
-                        for pattern in exclude_patterns:
-                            if pattern.startswith('*') and file.endswith(pattern[1:]):
-                                skip_file = True
-                                break
-                        if skip_file:
-                            continue
+                    # Skip files matching exclude patterns
+                    if exclude_patterns:
+                        rel_file = rel_root / file
+                        if any(fnmatch(str(rel_file), pat) or fnmatch(file, pat) for pat in exclude_patterns):
+                            continue

This keeps support for slice_* and also makes *pk* (and similar patterns) behave as expected.

dsperse/src/prover.py (3)

26-106: prove_dirs slice orchestration and run_results wiring look consistent

The per-slice proving flow (metadata-driven slice iteration, path resolution via Utils.resolve_under_slice, witness lookup under run_path/slice_id/output.json, and proof output via Utils.proof_output_path) is coherent and lines up with the new Utils.merge_execution_into_run_results / update_metadata_after_execution APIs. Skipped-slice logging for missing circuit/PK/witness is also sensible.

One small improvement: local proved is never used since you rely on proved_count from merge_execution_into_run_results; it can be dropped to avoid confusion.


108-129: dslice/dsperse conversions leave extracted dirs lying around

prove_dslice and prove_dsperse convert into dirs with cleanup=False, run proofs, then convert back to the original packaging again with cleanup=False. This keeps both the original archive and the extracted slice directories around, which can accumulate clutter over repeated runs.

If the intent is to leave the original packaging intact but not necessarily keep the extracted dirs, consider:

  • cleanup=False for the initial archive → dirs conversion, so the archive is preserved.
  • cleanup=True for the dirs → archive conversion, so temporary slice directories are removed once proofs are generated.

That will avoid silently growing temporary directories on repeated invocations.


131-149: prove() dispatcher behavior is sound, but docstring is slightly stale

The prove dispatcher correctly validates run_path via Utils.load_run_metadata and then routes based on Converter.detect_type(model_dir). That matches the supported packaging types (dslice, dsperse, dirs) and will fail fast on unknown ones.

Minor nit: the docstring still mentions "data_path packaging" while the parameter is model_dir, which might cause small confusion for CLI users.

You could update the docstring summary line to:

"""Route to the appropriate prove path based on `model_dir` packaging."""
dsperse/src/run/utils/runner_utils.py (5)

22-59: normalize_for_runtime behavior is reasonable, but broad exception swallowing hampers debugging

The normalization logic for packaging_type (dsperse / dslice / default dirs) makes sense and returns (slices_root, original_format, model_path) in a way that lines up with subsequent re-packaging.

However, both the dsperse and dslice branches swallow all exceptions:

except Exception:
    return (model_path / "slices").resolve(), None, model_path

If Converter.convert fails due to a real packaging bug, quietly falling back to model_path / "slices" can mask the underlying problem and lead to confusing downstream errors.

Consider at least logging the exception before falling back:

-            except Exception:
-                return (model_path / "slices").resolve(), None, model_path
+            except Exception as e:
+                logger.warning(
+                    "Failed to normalize %s packaging at %s, falling back to %s/slices: %s",
+                    packaging_type, source_path, model_path, e,
+                )
+                return (model_path / "slices").resolve(), None, model_path

Same pattern can be applied to the dslice branch.


95-102: repackage_if_needed: correct behavior but suppressed exceptions hide root cause

repackage_if_needed correctly re-emits slices in their original format (dslice/dsperse) via Converter.convert(..., cleanup=True), but it also catches a bare Exception and only logs a warning.

Given this step can materially affect what remains on disk, it may be worth:

  • Restricting the exception to ValueError/OSError or
  • Including more detail in the log (which you already partially do by logging the exception).

If you expect this to be rare and non-fatal, current behavior is acceptable, but please ensure callers are aware that a failed repackage does not raise.


151-167: process_final_output: behavior is fine; tensors in result may need serialization later

The softmax + argmax post-processing correctly normalizes logits to probabilities and extracts a scalar predicted_action. Note that storing raw tensors in the returned dict ("logits", "probabilities") is fine for in-memory chaining (e.g., into filter_tensor) but won’t be JSON-serializable as-is if you later write this object to disk.

If you ever persist this result, consider converting those tensors to .tolist() or .cpu().numpy().tolist() at the serialization boundary.


217-250: _is_sliced_model duplicates logic and may be worth centralizing

RunnerUtils._is_sliced_model reimplements slice detection in a way very similar to Converter.detect_type and CompilerUtils.is_sliced_model. That’s not wrong, but it does increase the risk of subtle divergence over time as formats evolve.

Longer term, consider routing this helper through a single canonical detection function (Converter.detect_type) and wrapping it with the (is_sliced, slice_path) tuple that callers expect, to keep detection logic in one place.


266-336: check_expected_shape: logic is reasonable, but defaults may be surprising

The shape validation logic (placeholder replacement, element-count equality, flattened-vs-multidim handling) is sound and will catch many mismatches while returning True when it can’t reason about the shape.

One subtle issue: when expected_shape_data is a flat list (e.g., ["batch_size", "unknown"]), treating shape_values = expected_shape_data[0] yields a string and iterates character-by-character. That’s only a fallback path right now, but you may want to standardize on nested lists ([[...]]) at the metadata source to avoid confusing log messages.

No change strictly required here; just something to keep in mind when setting defaults upstream.

dsperse/src/utils/utils.py (1)

316-359: update_metadata_after_execution: behavior is reasonable, but catches all Exceptions

The summary update logic (writing either proof_summary or verification_summary with counts and a timestamp) is appropriate and non-breaking.

You do catch a bare Exception both on read and write:

except Exception:
    logger.warning(...)
    return
...
except Exception as e:
    logger.warning(...)

Given metadata updates are non-critical, this is acceptable, but if debugging failures here ever becomes important, consider narrowing the excepts to OSError/json.JSONDecodeError or using logger.exception once to retain a traceback.

dsperse/src/compile/utils/compiler_utils.py (2)

75-99: _rel_from_payload / _with_slice_prefix duplicate logic from RunnerAnalyzer

These helpers mirror RunnerAnalyzer.rel_from_payload and RunnerAnalyzer.with_slice_prefix. That’s not inherently wrong, but maintaining multiple copies of the same path logic increases the chance of subtle drift as payload layouts evolve.

If you find yourself tweaking this logic later, consider centralizing it (e.g., under Utils or a dedicated path helper module) and relying on that from both compiler and analyzer utilities.


101-125: is_ezkl_compilation_successful and get_relative_paths are consistent, minor doc mismatch

The success check (compiled, vk_key, pk_key, settings all exist on disk under payload) is a solid gate for deciding whether compilation “really” succeeded.

get_relative_paths does the right thing by:

  • Converting each artifact path to a payload-relative path using _rel_from_payload.
  • Only including calibration when calibration_input exists.

Docstring says “Returns a tuple of (payload_rel_dict, calibration_rel_path)”, but the function actually returns a single dict with 'calibration' included.

You might want to update the docstring to:

"""Compute payload-relative paths for compiled artifacts and the calibration file.

Returns:
    dict[str, str | None]: mapping of keys ('settings', 'compiled', 'vk_key', 'pk_key', 'calibration')
    to payload-relative paths (or None when unavailable).
"""
dsperse/src/analyzers/runner_analyzer.py (5)

23-40: rel_from_payload / with_slice_prefix: helpers are fine, but duplicated elsewhere

These helpers correctly:

  • Extract the payload/... suffix from a path (or return None when absent).
  • Prefix a relative payload path with slice_#.

They’re used consistently within RunnerAnalyzer. Note that similar helpers exist in CompilerUtils; at some point you may want to centralize them in a shared utility to avoid drift.


42-51: load_slices_metadata: prefer logging.exception over re-raising

The function correctly loads <slices_dir>/metadata.json, but on failure:

except Exception as e:
    logger.error(f"Failed to load slices metadata from {slices_dir}: {e}")
    raise

Since you immediately re-raise, using logger.exception(...) would capture the traceback without extra boilerplate:

-        except Exception as e:
-            logger.error(f"Failed to load slices metadata from {slices_dir}: {e}")
+        except Exception:
+            logger.exception(f"Failed to load slices metadata from {slices_dir}")
            raise

Not critical, but improves debuggability.


54-227: process_slices: metadata fusion and circuit eligibility look solid, with a couple of subtle edge cases

The slice processing pipeline is well thought out:

  • It merges per-slice metadata (compilation.ezkl.files, io, deps) with model-level fields on slice_data, with sensible fallbacks.
  • compiled_flag reflects either explicit compiled flags or presence of a compiled artifact path.
  • circuit_exists, keys_exist, packaged_only, and use_circuit correctly distinguish between unpacked directories and packaged-only artifacts (e.g., dslice/dsperse).
  • ONNX paths are normalized to slice_#/payload/..., and EZKL paths are emitted the same way for downstream consumers.
  • Circuit size is bounded by RunnerAnalyzer.SIZE_LIMIT when deciding use_circuit.

Two minor observations:

  • When output_shape falls back to ["batch_size", "unknown"], this shape is not nested, which can lead to slightly odd diagnostics in RunnerUtils.check_expected_shape (it assumes nested lists). It still degrades gracefully, but you may want to update the fallback to something like [["batch_size", "unk__0"]] in the future.
  • _resolve_abs and _resolve_abs_for_onnx capture slice_key / slice_dir_abs from the loop; since they’re only used immediately, that’s safe despite static-analysis complaints.

Functionally, this looks correct.


316-385: _normalize_to_dirs: behavior is mostly reasonable, but docstring and fallbacks could be clearer

The normalization logic supports:

  • Model roots with slices/metadata.json.
  • Slices dirs with metadata.json (including dslice-at-root).
  • Raw slice_* dirs with metadata.json + payload.
  • Single dslice/dsperse files converted to dirs via Converter.convert.
  • Directories recognized by Converter.detect_type as dirs.

A couple of points:

  • Docstring says it returns (model_root, slices_dir, original_format), but the function actually returns only (slices_dir, original_format). Updating the docstring would avoid confusion.

  • The final fallback:

    return (path_obj.parent / 'slices').resolve(), 'dirs'

    silently guesses a slices directory even when nothing matches. Depending on where this is used, you may want to log a warning when falling back this way so callers can tell the input was not recognized explicitly.

Overall, though, this is a reasonable normalization strategy.


507-512: Conversion-back on generate_run_metadata: exception swallowing is acceptable but might deserve a log

Converting back to the original format:

if original_format != 'dirs':
    try:
        Converter.convert(str(slices_dir), output_type=original_format, cleanup=True)
    except Exception:
        logger.warning("Failed to convert slices back to original format; continuing.")

Given this is a post-processing convenience, swallowing the exception with a warning is reasonable. Just be aware that callers relying on the on-disk packaging being restored should pay attention to this log; you may eventually want to surface this condition more prominently if it’s considered important.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b40065 and fa29c09.

📒 Files selected for processing (32)
  • .gitignore (1 hunks)
  • README.md (9 hunks)
  • docs/arc42.md (3 hunks)
  • docs/overview.md (1 hunks)
  • dsperse/main.py (1 hunks)
  • dsperse/src/analyzers/onnx_analyzer.py (4 hunks)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
  • dsperse/src/backends/ezkl.py (10 hunks)
  • dsperse/src/backends/onnx_models.py (1 hunks)
  • dsperse/src/backends/pth_models.py (1 hunks)
  • dsperse/src/cli/__init__.py (1 hunks)
  • dsperse/src/cli/base.py (3 hunks)
  • dsperse/src/cli/compile.py (6 hunks)
  • dsperse/src/cli/convert.py (1 hunks)
  • dsperse/src/cli/prove.py (8 hunks)
  • dsperse/src/cli/run.py (4 hunks)
  • dsperse/src/cli/slice.py (4 hunks)
  • dsperse/src/cli/verify.py (5 hunks)
  • dsperse/src/compile/compiler.py (1 hunks)
  • dsperse/src/compile/utils/compiler_utils.py (1 hunks)
  • dsperse/src/compiler.py (0 hunks)
  • dsperse/src/prover.py (3 hunks)
  • dsperse/src/run/runner.py (1 hunks)
  • dsperse/src/run/utils/runner_utils.py (1 hunks)
  • dsperse/src/runner.py (0 hunks)
  • dsperse/src/slice/onnx_slicer.py (5 hunks)
  • dsperse/src/slice/slicer.py (3 hunks)
  • dsperse/src/slice/utils/converter.py (1 hunks)
  • dsperse/src/slice/utils/onnx_utils.py (1 hunks)
  • dsperse/src/utils/runner_utils/runner_utils.py (0 hunks)
  • dsperse/src/utils/utils.py (3 hunks)
  • dsperse/src/verifier.py (3 hunks)
💤 Files with no reviewable changes (3)
  • dsperse/src/runner.py
  • dsperse/src/compiler.py
  • dsperse/src/utils/runner_utils/runner_utils.py
🧰 Additional context used
🧬 Code graph analysis (22)
dsperse/src/backends/onnx_models.py (1)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-336)
dsperse/main.py (2)
dsperse/src/cli/slice.py (2)
  • slice_convert (62-114)
  • slice_model (117-211)
dsperse/src/slice/slicer.py (1)
  • slice_model (58-78)
dsperse/src/compile/utils/compiler_utils.py (3)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • get_version (137-158)
dsperse/src/run/runner.py (2)
  • run (48-85)
  • run_onnx_slice (88-121)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • detect_type (93-122)
dsperse/src/verifier.py (3)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • verify (334-394)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • convert (20-90)
  • detect_type (93-122)
dsperse/src/utils/utils.py (10)
  • Utils (12-359)
  • dirs_root_from (183-190)
  • load_run_metadata (174-180)
  • load_run_results (241-247)
  • iter_circuit_slices (193-203)
  • slice_dirs_path (221-222)
  • resolve_under_slice (206-218)
  • merge_execution_into_run_results (257-313)
  • save_run_results (250-254)
  • update_metadata_after_execution (316-359)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
  • EZKL (107-693)
  • compilation_pipeline (578-671)
  • get_version (137-158)
dsperse/src/compile/utils/compiler_utils.py (6)
  • is_ezkl_compilation_successful (101-109)
  • get_relative_paths (112-125)
  • update_slice_metadata (174-223)
  • run_onnx_inference_chain (227-273)
  • parse_layers (53-70)
  • is_sliced_model (15-50)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • convert (20-90)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • find_metadata_path (45-65)
  • save_metadata_file (18-42)
dsperse/src/backends/pth_models.py (2)
dsperse/src/run/runner.py (1)
  • run (48-85)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-336)
dsperse/src/run/runner.py (5)
dsperse/src/analyzers/runner_analyzer.py (3)
  • RunnerAnalyzer (16-513)
  • generate_run_metadata (450-513)
  • get_execution_chain (302-305)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • generate_witness (160-242)
dsperse/src/backends/onnx_models.py (2)
  • OnnxModels (13-222)
  • run_inference (18-60)
dsperse/src/run/utils/runner_utils.py (7)
  • RunnerUtils (16-336)
  • normalize_for_runtime (22-58)
  • make_run_dir (61-68)
  • prepare_slice_io (71-76)
  • execute_slice (79-93)
  • filter_tensor (253-263)
  • repackage_if_needed (96-101)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • read_input (134-171)
  • write_input (127-131)
dsperse/src/slice/utils/onnx_utils.py (1)
dsperse/src/utils/utils.py (1)
  • save_metadata_file (18-42)
dsperse/src/slice/slicer.py (2)
dsperse/src/slice/onnx_slicer.py (2)
  • slice (326-457)
  • slice_model (499-519)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • convert (20-90)
dsperse/src/cli/slice.py (4)
dsperse/src/cli/base.py (3)
  • check_model_dir (157-174)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
dsperse/src/slice/slicer.py (2)
  • Slicer (19-78)
  • slice_model (58-78)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • _dsperse_to_dirs (306-337)
  • convert (20-90)
dsperse/src/cli/convert.py (1)
  • convert (44-91)
dsperse/src/cli/prove.py (4)
dsperse/src/prover.py (2)
  • Prover (14-149)
  • prove (131-149)
dsperse/src/cli/base.py (3)
  • save_result (218-240)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • convert (20-90)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • load_run_metadata (174-180)
  • run_results_path (237-238)
dsperse/src/slice/onnx_slicer.py (1)
dsperse/src/utils/utils.py (1)
  • filter_inputs (69-91)
dsperse/src/run/utils/runner_utils.py (3)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • convert (20-90)
  • detect_type (93-122)
dsperse/src/utils/torch_utils.py (2)
  • ModelUtils (20-538)
  • load_metadata (528-538)
dsperse/src/run/runner.py (2)
  • _run_ezkl_slice (123-188)
  • run_onnx_slice (88-121)
dsperse/src/cli/verify.py (4)
dsperse/src/verifier.py (2)
  • Verifier (15-135)
  • verify (123-135)
dsperse/src/cli/base.py (3)
  • save_result (218-240)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • _dsperse_to_dirs (306-337)
  • convert (20-90)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • load_run_metadata (174-180)
  • run_results_path (237-238)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (2)
  • Utils (12-359)
  • save_metadata_file (18-42)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • detect_type (93-122)
  • convert (20-90)
dsperse/src/cli/convert.py (2)
dsperse/src/cli/base.py (2)
  • normalize_path (21-32)
  • prompt_for_value (242-302)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • convert (20-90)
  • _dsperse_to_dirs (306-337)
dsperse/src/slice/utils/converter.py (1)
dsperse/src/cli/convert.py (1)
  • convert (44-91)
dsperse/src/prover.py (3)
dsperse/src/backends/ezkl.py (1)
  • prove (244-332)
dsperse/src/slice/utils/converter.py (2)
  • convert (20-90)
  • detect_type (93-122)
dsperse/src/utils/utils.py (12)
  • Utils (12-359)
  • dirs_root_from (183-190)
  • load_run_metadata (174-180)
  • iter_circuit_slices (193-203)
  • slice_dirs_path (221-222)
  • resolve_under_slice (206-218)
  • witness_path_for (225-226)
  • proof_output_path (229-234)
  • load_run_results (241-247)
  • merge_execution_into_run_results (257-313)
  • save_run_results (250-254)
  • update_metadata_after_execution (316-359)
dsperse/src/analyzers/onnx_analyzer.py (3)
dsperse/src/slice/onnx_slicer.py (1)
  • slice (326-457)
dsperse/src/slice/utils/onnx_utils.py (2)
  • OnnxUtils (14-214)
  • write_slice_dirs_metadata (67-214)
dsperse/src/utils/utils.py (2)
  • Utils (12-359)
  • save_metadata_file (18-42)
dsperse/src/cli/run.py (2)
dsperse/src/run/runner.py (2)
  • run (48-85)
  • Runner (22-247)
dsperse/src/cli/base.py (3)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
  • check_model_dir (157-174)
dsperse/src/cli/compile.py (2)
dsperse/src/compile/compiler.py (2)
  • compile (143-181)
  • Compiler (23-181)
dsperse/src/cli/base.py (3)
  • check_model_dir (157-174)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
dsperse/src/backends/ezkl.py (3)
dsperse/src/run/runner.py (1)
  • run (48-85)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-336)
dsperse/src/compile/compiler.py (1)
  • compile (143-181)
🪛 markdownlint-cli2 (0.18.1)
README.md

92-92: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.14.5)
dsperse/src/compile/utils/compiler_utils.py

258-258: f-string without any placeholders

Remove extraneous f prefix

(F541)


260-260: Unpacked variable tensor is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

dsperse/src/verifier.py

47-48: try-except-pass detected, consider logging the exception

(S110)


47-47: Do not catch blind exception: Exception

(BLE001)


135-135: Avoid specifying long messages outside the exception class

(TRY003)


171-171: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/compile/compiler.py

52-52: Avoid specifying long messages outside the exception class

(TRY003)


54-54: f-string without any placeholders

Remove extraneous f prefix

(F541)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


55-55: f-string without any placeholders

Remove extraneous f prefix

(F541)


62-62: f-string without any placeholders

Remove extraneous f prefix

(F541)


85-85: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Avoid specifying long messages outside the exception class

(TRY003)


181-181: Avoid specifying long messages outside the exception class

(TRY003)


202-202: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/run/runner.py

23-23: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


23-23: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


27-27: Create your own exception

(TRY002)


27-27: Avoid specifying long messages outside the exception class

(TRY003)


48-48: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


69-69: Create your own exception

(TRY002)


69-69: Avoid specifying long messages outside the exception class

(TRY003)


88-88: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


123-123: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


136-136: Do not catch blind exception: Exception

(BLE001)


168-168: Do not catch blind exception: Exception

(BLE001)


213-213: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)

dsperse/src/slice/utils/onnx_utils.py

47-48: try-except-pass detected, consider logging the exception

(S110)


47-47: Do not catch blind exception: Exception

(BLE001)


57-58: try-except-pass detected, consider logging the exception

(S110)


57-57: Do not catch blind exception: Exception

(BLE001)


59-60: try-except-continue detected, consider logging the exception

(S112)


59-59: Do not catch blind exception: Exception

(BLE001)


61-62: try-except-pass detected, consider logging the exception

(S110)


61-61: Do not catch blind exception: Exception

(BLE001)


80-80: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Do not catch blind exception: Exception

(BLE001)


129-129: Do not catch blind exception: Exception

(BLE001)


151-151: Do not catch blind exception: Exception

(BLE001)


203-203: Do not catch blind exception: Exception

(BLE001)

dsperse/src/slice/slicer.py

70-70: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/cli/slice.py

99-100: try-except-pass detected, consider logging the exception

(S110)


99-99: Do not catch blind exception: Exception

(BLE001)


112-112: Do not catch blind exception: Exception

(BLE001)

dsperse/src/cli/prove.py

106-106: Do not catch blind exception: Exception

(BLE001)


127-127: Do not catch blind exception: Exception

(BLE001)


138-138: Do not catch blind exception: Exception

(BLE001)


162-162: Do not catch blind exception: Exception

(BLE001)


279-279: Loop control variable files not used within loop body

Rename unused files to _files

(B007)


342-342: Do not catch blind exception: Exception

(BLE001)


406-406: Do not catch blind exception: Exception

(BLE001)


461-461: Do not catch blind exception: Exception

(BLE001)

dsperse/src/slice/onnx_slicer.py

420-420: Do not catch blind exception: Exception

(BLE001)


426-426: Do not catch blind exception: Exception

(BLE001)

dsperse/src/run/utils/runner_utils.py

36-36: Do not catch blind exception: Exception

(BLE001)


50-50: Consider moving this statement to an else block

(TRY300)


51-51: Do not catch blind exception: Exception

(BLE001)


100-100: Do not catch blind exception: Exception

(BLE001)


109-109: Unused static method argument: model_directory

(ARG004)


109-109: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


109-109: Unused static method argument: save_reshape

(ARG004)


140-140: Prefer TypeError exception for invalid type

(TRY004)


140-140: Avoid specifying long messages outside the exception class

(TRY003)


256-256: Local variable probabilities is assigned to but never used

Remove assignment to unused variable probabilities

(F841)


257-257: Local variable predicted_action is assigned to but never used

Remove assignment to unused variable predicted_action

(F841)

dsperse/src/cli/verify.py

103-103: Do not catch blind exception: Exception

(BLE001)


123-123: Do not catch blind exception: Exception

(BLE001)


134-134: Do not catch blind exception: Exception

(BLE001)


157-157: Do not catch blind exception: Exception

(BLE001)


233-233: Loop control variable files not used within loop body

Rename unused files to _files

(B007)


299-299: Local variable cleanup_unpacked is assigned to but never used

Remove assignment to unused variable cleanup_unpacked

(F841)


301-301: Do not catch blind exception: Exception

(BLE001)


364-364: Do not catch blind exception: Exception

(BLE001)


413-413: Do not catch blind exception: Exception

(BLE001)

dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


80-80: Do not catch blind exception: Exception

(BLE001)


89-89: Do not catch blind exception: Exception

(BLE001)


126-126: Do not catch blind exception: Exception

(BLE001)


134-134: Function definition does not bind loop variable slice_dir_abs

(B023)


138-138: Function definition does not bind loop variable slice_key

(B023)


141-141: Function definition does not bind loop variable slice_dir_abs

(B023)


163-163: Do not catch blind exception: Exception

(BLE001)


182-182: Function definition does not bind loop variable slice_dir_abs

(B023)


185-185: Function definition does not bind loop variable slice_key

(B023)


188-188: Function definition does not bind loop variable slice_dir_abs

(B023)


341-341: Do not catch blind exception: Exception

(BLE001)


350-351: try-except-pass detected, consider logging the exception

(S110)


350-350: Do not catch blind exception: Exception

(BLE001)


361-361: Do not catch blind exception: Exception

(BLE001)


376-376: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


376-376: Useless if-else condition

(RUF034)


398-399: try-except-pass detected, consider logging the exception

(S110)


398-398: Do not catch blind exception: Exception

(BLE001)


410-410: Do not catch blind exception: Exception

(BLE001)


420-421: try-except-continue detected, consider logging the exception

(S112)


420-420: Do not catch blind exception: Exception

(BLE001)


430-431: try-except-pass detected, consider logging the exception

(S110)


430-430: Do not catch blind exception: Exception

(BLE001)


475-475: Do not catch blind exception: Exception

(BLE001)


493-494: try-except-pass detected, consider logging the exception

(S110)


493-493: Do not catch blind exception: Exception

(BLE001)


510-510: Do not catch blind exception: Exception

(BLE001)

dsperse/src/cli/convert.py

82-83: try-except-pass detected, consider logging the exception

(S110)


82-82: Do not catch blind exception: Exception

(BLE001)


89-89: Do not catch blind exception: Exception

(BLE001)

dsperse/src/utils/utils.py

64-64: Avoid specifying long messages outside the exception class

(TRY003)


178-178: Avoid specifying long messages outside the exception class

(TRY003)


294-294: Avoid specifying long messages outside the exception class

(TRY003)


329-329: Do not catch blind exception: Exception

(BLE001)


349-349: Avoid specifying long messages outside the exception class

(TRY003)


358-358: Do not catch blind exception: Exception

(BLE001)

dsperse/src/slice/utils/converter.py

20-20: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


35-35: Avoid specifying long messages outside the exception class

(TRY003)


37-37: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Do not catch blind exception: Exception

(BLE001)


88-88: Do not catch blind exception: Exception

(BLE001)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


122-122: Avoid specifying long messages outside the exception class

(TRY003)


135-135: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


165-165: Do not catch blind exception: Exception

(BLE001)


170-170: Avoid specifying long messages outside the exception class

(TRY003)


195-195: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


211-211: Avoid specifying long messages outside the exception class

(TRY003)


264-264: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


287-287: Avoid specifying long messages outside the exception class

(TRY003)


306-306: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


313-313: Avoid specifying long messages outside the exception class

(TRY003)


341-341: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


346-346: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)


437-437: Do not catch blind exception: Exception

(BLE001)

dsperse/src/prover.py

149-149: Avoid specifying long messages outside the exception class

(TRY003)


190-190: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/cli/compile.py

176-176: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/backends/ezkl.py

145-145: subprocess call: check for execution of untrusted input

(S603)


155-155: Do not catch blind exception: Exception

(BLE001)


601-601: f-string without any placeholders

Remove extraneous f prefix

(F541)


602-602: f-string without any placeholders

Remove extraneous f prefix

(F541)


603-603: f-string without any placeholders

Remove extraneous f prefix

(F541)


604-604: f-string without any placeholders

Remove extraneous f prefix

(F541)

…imizing imports, and removing deprecated or unused options for improved clarity and maintainability.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (15)
dsperse/src/cli/prove.py (2)

97-147: Fast-path implementation duplicates metadata/data_path logic with legacy flow.

Lines 103-118 duplicate the metadata loading and data_path inference logic that also exists in lines 287-301. Consider extracting a helper like _resolve_data_path(run_path, explicit_data_path, metadata) to reduce duplication and ensure consistency.

def _resolve_data_path(run_path: str, explicit_data_path: str | None, metadata: dict) -> str | None:
    """Derive data_path from explicit arg, metadata source_path, or model_path/slices."""
    if explicit_data_path:
        return normalize_path(explicit_data_path)
    source = metadata.get('source_path')
    if source:
        return normalize_path(source)
    model_path = metadata.get('model_path')
    if model_path:
        return normalize_path(str(Path(model_path) / 'slices'))
    return None

346-347: Remove empty finally block.

The empty finally: pass block serves no purpose and adds noise. Remove it unless you plan to add cleanup logic.

     except Exception as e:
         print(f"{Fore.RED}Error proving run: {e}{Style.RESET_ALL}")
         traceback.print_exc()
-    finally:
-        pass
dsperse/src/run/utils/runner_utils.py (2)

108-150: Unused parameters and implicit Optional type hint.

The model_directory and save_reshape parameters are unused (intended for deferred reshape functionality per commented code). Update the type hint to use explicit Optional[str] for clarity:

+from typing import Optional
+
 @staticmethod
-def preprocess_input(input_path:str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor:
+def preprocess_input(
+    input_path: str,
+    model_directory: Optional[str] = None,
+    save_reshape: bool = False,
+) -> torch.Tensor:
     """
     Preprocess input data from JSON.
+    
+    Args:
+        input_path: Path to input JSON file
+        model_directory: (Reserved for future reshape feature)
+        save_reshape: (Reserved for future reshape feature)
     """

Note: Optional is already imported at line 5, but the type hint at line 109 still uses the implicit form.


36-37: Broad exception catches silently swallow errors.

Multiple places catch Exception and either pass or return fallback values without logging. This can mask real issues:

  • Lines 36-37: normalize_for_runtime dsperse conversion failure
  • Lines 51-52: dslice conversion failure
  • Lines 100-101: repackage_if_needed logs but other places don't

Consider logging at least a debug/warning message for silent fallbacks:

         if packaging_type == "dsperse":
             try:
                 converted = Converter.convert(source_path, output_type="dirs", cleanup=False)
                 return Path(converted), "dsperse", model_path
-            except Exception:
+            except Exception as e:
+                logger.debug(f"Failed to convert dsperse source, falling back to dirs: {e}")
                 return (model_path / "slices").resolve(), None, model_path

Also applies to: 51-52, 100-101

dsperse/src/cli/convert.py (1)

76-82: Calling private method _dsperse_to_dirs directly and silently swallowing unlink errors.

Two issues here:

  1. Converter._dsperse_to_dirs is a private method (underscore prefix). If expand_slices is needed at the public API level, consider exposing it through Converter.convert() as a parameter.
  2. The try: p.unlink() except Exception: pass silently ignores errors—permission issues or file-in-use scenarios will go unnoticed.
         if p.is_file() and p.suffix == '.dsperse' and getattr(args, 'to_type', None) == 'dirs':
-            result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False)))
+            # TODO: Consider adding expand_slices param to Converter.convert() public API
+            result = Converter._dsperse_to_dirs(p, output_path, expand_slices=bool(getattr(args, 'expand_slices', False)))
             if bool(getattr(args, 'cleanup', True)):
                 try:
                     p.unlink()
-                except Exception:
-                    pass
+                except Exception as e:
+                    logger.warning(f"Could not remove source file {p}: {e}")
dsperse/src/cli/verify.py (2)

294-295: Remove empty finally block.

Same as in prove.py—serves no purpose.

     except Exception as e:
         print(f"{Fore.RED}Error verifying run: {e}{Style.RESET_ALL}")
         traceback.print_exc()
-    finally:
-        pass

94-143: Fast-path mirrors prove.py's structure—consider shared extraction.

The fast-path logic (lines 94-143) closely mirrors prove.py lines 97-147: load metadata, resolve data_path, invoke backend, save results, print summary. While the past review comment noted this was deferred, extracting common patterns to base.py would reduce maintenance burden as CLI commands evolve.

README.md (1)

92-92: Minor markdown list indentation.

Static analysis flagged unordered list indentation at line 92 (expected 2 spaces, found 4). This is a minor formatting nit that won't affect rendering in most Markdown parsers.

   - The script will:
-    - Install the DSperse CLI in editable mode so the dsperse command is available
+  - Install the DSperse CLI in editable mode so the dsperse command is available
dsperse/src/verifier.py (1)

41-48: Silent exception swallows potential data issues.

The try-except-pass when building proof_paths_by_slice could hide malformed run_results.json data. Consider logging at debug level:

         try:
             for entry in run_results.get("execution_chain", {}).get("execution_results", []):
                 sid = entry.get("slice_id") or entry.get("segment_id")
                 pe = entry.get("proof_execution", {}) if isinstance(entry, dict) else {}
                 if sid and pe and pe.get("proof_file"):
                     proof_paths_by_slice[sid] = pe.get("proof_file")
-        except Exception:
-            pass
+        except Exception as e:
+            logger.debug(f"Could not parse proof paths from run_results: {e}")
dsperse/src/cli/slice.py (2)

90-101: Consider logging cleanup failures explicitly.

The cleanup logic silently swallows exceptions when deleting the source file. While this may be intentional to prevent cleanup failures from blocking the conversion, it would be helpful to log when cleanup fails so users are aware the source file remains.

Apply this diff to add logging:

             if bool(getattr(args, 'cleanup', True)):
                 try:
                     p.unlink()
-                except Exception:
-                    pass
+                except Exception as e:
+                    logger.warning(f"Failed to cleanup source file {p}: {e}")

92-93: Direct call to private method may break encapsulation.

The code directly calls Converter._dsperse_to_dirs, which is a private method (indicated by the leading underscore). This tight coupling could break if the Converter's internal implementation changes. Consider whether this special case should be part of Converter's public API instead.

If this pattern is used elsewhere or if _dsperse_to_dirs has specific behavior not available through the public convert method, please confirm this is intentional. Otherwise, consider routing through the public API:

result = Converter.convert(
    str(p),
    output_type='dirs',
    output_path=output_path,
    cleanup=False  # Handle cleanup separately after
)
dsperse/src/run/runner.py (1)

95-106: Extract duplicated path resolution logic.

The _resolve_rel_path helper function inside run_onnx_slice is duplicated in _run_ezkl_slice (lines 128-144). This duplication makes maintenance harder and increases the risk of inconsistent behavior.

Consider extracting this to a shared helper method:

+    @staticmethod
+    def _resolve_slice_path(path: str, base_dir: Path) -> str:
+        """Resolve a path that may be relative to base_dir, handling slice_# prefixes."""
+        if not path:
+            return None
+        p_str = str(path)
+        if os.path.isabs(p_str):
+            return p_str
+        try:
+            sd_name = os.path.basename(os.path.abspath(str(base_dir))) if base_dir else None
+        except Exception:
+            sd_name = None
+        parts = p_str.split(os.sep)
+        if sd_name and parts and parts[0] == sd_name:
+            parts = parts[1:]
+            p_str = os.path.join(*parts) if parts else ''
+        if base_dir:
+            return str((Path(base_dir) / p_str).resolve())
+        return os.path.abspath(p_str)
+
     @staticmethod
     def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Path = None):
         """Run ONNX inference for a slice."""
         onnx_path = slice_info.get("path")
-        
-        def _resolve_rel_path(p: str, base_dir: Path) -> str:
-            ...
-        
         if onnx_path and not os.path.isabs(str(onnx_path)):
-            onnx_path = _resolve_rel_path(onnx_path, slice_dir)
+            onnx_path = Runner._resolve_slice_path(onnx_path, slice_dir)

Then use the same helper in _run_ezkl_slice on lines 151-156.

dsperse/src/compile/compiler.py (1)

74-79: Consolidate duplicate metadata update logic.

Lines 74-79 have duplicated logic for updating slice metadata, differing only in how the path is constructed. This can be simplified to improve maintainability.

Apply this diff:

-        if slice_data.get('slice_metadata') and os.path.exists(slice_data.get('slice_metadata')):
-            path = Path(slice_data.get('slice_metadata'))
-            CompilerUtils.update_slice_metadata(path, success, file_paths)
-        elif slice_data.get('slice_metadata_relative_path') and os.path.exists(os.path.join(base_path, slice_data.get('slice_metadata_relative_path'))):
-            path = Path(os.path.join(base_path, slice_data.get('slice_metadata_relative_path')))
-            CompilerUtils.update_slice_metadata(path, success, file_paths)
+        metadata_path = slice_data.get('slice_metadata')
+        if not metadata_path and slice_data.get('slice_metadata_relative_path'):
+            metadata_path = os.path.join(base_path, slice_data.get('slice_metadata_relative_path'))
+        
+        if metadata_path and os.path.exists(metadata_path):
+            CompilerUtils.update_slice_metadata(Path(metadata_path), success, file_paths)
dsperse/src/analyzers/runner_analyzer.py (2)

128-141: Consolidate duplicate path resolution functions.

The method defines two very similar local functions for resolving absolute paths:

  • _resolve_abs (lines 128-141) for EZKL artifacts
  • _resolve_abs_for_onnx (lines 176-188) for ONNX paths

Both handle slice_dir_abs resolution and slice_key prefix stripping. Additionally, static analysis warns that these functions capture loop variables (slice_dir_abs, slice_key), which can lead to unexpected behavior.

Consider extracting a single static or class method that takes these values as explicit parameters:

@staticmethod
def _resolve_to_absolute(path: str, slice_dir: Path, slice_key: str) -> Optional[str]:
    """Resolve a possibly-relative path to absolute, handling slice_key prefixes."""
    if not path:
        return None
    p_str = str(path)
    if os.path.isabs(p_str):
        return p_str
    if slice_dir is None:
        return p_str
    # Strip slice_key prefix if present
    parts = p_str.split(os.sep)
    if parts and parts[0] == slice_key:
        parts = parts[1:]
        p_str = os.path.join(*parts) if parts else ''
    return str((slice_dir / p_str).resolve())

Then use this method for both EZKL and ONNX path resolution, which also eliminates the B023 warnings about loop variable binding.

Also applies to: 176-188


315-384: Consider refactoring complex normalization logic.

The _normalize_to_dirs method handles many different input cases but has grown quite complex with nested conditions and multiple try-except blocks. This makes it harder to understand and maintain.

Consider breaking this into smaller helper methods:

  • _handle_directory_input(path: Path) -> tuple[Path, str]
  • _handle_file_input(path: Path) -> tuple[Path, str]
  • _handle_model_root(path: Path) -> tuple[Path, str]

Also, line 376 assigns model_root but never uses it. This should be removed:

-            # If this looks like a slices folder (has metadata.json), parent is model root
-            model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent
             return sdir.resolve(), 'dirs'
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa29c09 and e7f16c4.

📒 Files selected for processing (11)
  • README.md (9 hunks)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
  • dsperse/src/cli/convert.py (1 hunks)
  • dsperse/src/cli/prove.py (9 hunks)
  • dsperse/src/cli/slice.py (4 hunks)
  • dsperse/src/cli/verify.py (7 hunks)
  • dsperse/src/compile/compiler.py (1 hunks)
  • dsperse/src/compile/utils/compiler_utils.py (1 hunks)
  • dsperse/src/run/runner.py (1 hunks)
  • dsperse/src/run/utils/runner_utils.py (1 hunks)
  • dsperse/src/verifier.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dsperse/src/cli/prove.py (4)
dsperse/src/cli/base.py (3)
  • save_result (218-240)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • load_run_metadata (174-180)
  • run_results_path (237-238)
dsperse/src/cli/verify.py (4)
  • is_run_id_dir (148-151)
  • is_run_root_dir (153-158)
  • get_latest_run (67-83)
  • get_all_runs (43-65)
install.sh (1)
  • info (57-57)
dsperse/src/run/utils/runner_utils.py (4)
dsperse/src/slice/onnx_slicer.py (1)
  • slice (326-457)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • convert (20-90)
  • detect_type (93-122)
dsperse/src/utils/torch_utils.py (1)
  • ModelUtils (20-538)
dsperse/src/run/runner.py (2)
  • _run_ezkl_slice (123-188)
  • run_onnx_slice (88-121)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
  • EZKL (107-693)
  • compilation_pipeline (578-671)
  • get_version (137-158)
dsperse/src/compile/utils/compiler_utils.py (6)
  • is_ezkl_compilation_successful (105-113)
  • get_relative_paths (116-129)
  • update_slice_metadata (178-227)
  • run_onnx_inference_chain (231-277)
  • parse_layers (57-74)
  • is_sliced_model (15-54)
dsperse/src/slice/utils/converter.py (1)
  • convert (20-90)
dsperse/src/utils/utils.py (2)
  • find_metadata_path (45-65)
  • save_metadata_file (18-42)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
  • save_metadata_file (18-42)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • detect_type (93-122)
  • convert (20-90)
🪛 markdownlint-cli2 (0.18.1)
README.md

92-92: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.14.5)
dsperse/src/cli/convert.py

81-82: try-except-pass detected, consider logging the exception

(S110)


81-81: Do not catch blind exception: Exception

(BLE001)


96-96: Do not catch blind exception: Exception

(BLE001)

dsperse/src/verifier.py

47-48: try-except-pass detected, consider logging the exception

(S110)


47-47: Do not catch blind exception: Exception

(BLE001)


135-135: Avoid specifying long messages outside the exception class

(TRY003)


171-171: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/cli/prove.py

105-105: Do not catch blind exception: Exception

(BLE001)


126-126: Do not catch blind exception: Exception

(BLE001)


137-137: Do not catch blind exception: Exception

(BLE001)


162-162: Do not catch blind exception: Exception

(BLE001)


289-289: Do not catch blind exception: Exception

(BLE001)

dsperse/src/cli/verify.py

102-102: Do not catch blind exception: Exception

(BLE001)


122-122: Do not catch blind exception: Exception

(BLE001)


133-133: Do not catch blind exception: Exception

(BLE001)


157-157: Do not catch blind exception: Exception

(BLE001)


169-169: Local variable specified_run_dir is assigned to but never used

Remove assignment to unused variable specified_run_dir

(F841)


242-242: Do not catch blind exception: Exception

(BLE001)


291-291: Do not catch blind exception: Exception

(BLE001)

dsperse/src/run/runner.py

23-23: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


23-23: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


27-27: Create your own exception

(TRY002)


27-27: Avoid specifying long messages outside the exception class

(TRY003)


48-48: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


69-69: Create your own exception

(TRY002)


69-69: Avoid specifying long messages outside the exception class

(TRY003)


88-88: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


123-123: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


136-136: Do not catch blind exception: Exception

(BLE001)


168-168: Do not catch blind exception: Exception

(BLE001)


213-213: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)

dsperse/src/cli/slice.py

97-98: try-except-pass detected, consider logging the exception

(S110)


97-97: Do not catch blind exception: Exception

(BLE001)


112-112: Do not catch blind exception: Exception

(BLE001)

dsperse/src/run/utils/runner_utils.py

36-36: Do not catch blind exception: Exception

(BLE001)


50-50: Consider moving this statement to an else block

(TRY300)


51-51: Do not catch blind exception: Exception

(BLE001)


100-100: Do not catch blind exception: Exception

(BLE001)


109-109: Unused static method argument: model_directory

(ARG004)


109-109: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


109-109: Unused static method argument: save_reshape

(ARG004)


132-132: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Prefer TypeError exception for invalid type

(TRY004)


143-143: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/compile/compiler.py

52-52: Avoid specifying long messages outside the exception class

(TRY003)


54-54: f-string without any placeholders

Remove extraneous f prefix

(F541)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


55-55: f-string without any placeholders

Remove extraneous f prefix

(F541)


62-62: f-string without any placeholders

Remove extraneous f prefix

(F541)


85-85: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Avoid specifying long messages outside the exception class

(TRY003)


181-181: Avoid specifying long messages outside the exception class

(TRY003)


203-203: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


80-80: Do not catch blind exception: Exception

(BLE001)


89-89: Do not catch blind exception: Exception

(BLE001)


126-126: Do not catch blind exception: Exception

(BLE001)


134-134: Function definition does not bind loop variable slice_dir_abs

(B023)


138-138: Function definition does not bind loop variable slice_key

(B023)


141-141: Function definition does not bind loop variable slice_dir_abs

(B023)


163-163: Do not catch blind exception: Exception

(BLE001)


182-182: Function definition does not bind loop variable slice_dir_abs

(B023)


185-185: Function definition does not bind loop variable slice_key

(B023)


188-188: Function definition does not bind loop variable slice_dir_abs

(B023)


341-341: Do not catch blind exception: Exception

(BLE001)


350-351: try-except-pass detected, consider logging the exception

(S110)


350-350: Do not catch blind exception: Exception

(BLE001)


361-361: Do not catch blind exception: Exception

(BLE001)


376-376: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


376-376: Useless if-else condition

(RUF034)


398-399: try-except-pass detected, consider logging the exception

(S110)


398-398: Do not catch blind exception: Exception

(BLE001)


410-410: Do not catch blind exception: Exception

(BLE001)


420-421: try-except-continue detected, consider logging the exception

(S112)


420-420: Do not catch blind exception: Exception

(BLE001)


430-431: try-except-pass detected, consider logging the exception

(S110)


430-430: Do not catch blind exception: Exception

(BLE001)


475-475: Do not catch blind exception: Exception

(BLE001)


493-494: try-except-pass detected, consider logging the exception

(S110)


493-493: Do not catch blind exception: Exception

(BLE001)


517-517: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (23)
dsperse/src/cli/prove.py (1)

26-42: Updated CLI interface with positional arguments looks good.

The new positional arguments (run_path, data_path) with optional --proof-output provide a cleaner interface while maintaining backward compatibility with deprecated flags. Clear help text distinguishes the preferred approach from legacy options.

dsperse/src/run/utils/runner_utils.py (2)

78-93: execute_slice provides clean fallback orchestration.

Good pattern: tries EZKL first when use_circuit is set, falls back to ONNX on failure, and properly propagates error context through attempted_ezkl and merged error info.


271-342: Shape validation logic is thorough but returns True on indeterminate cases.

check_expected_shape returns True when it cannot determine the expected shape (line 341). This is a reasonable "optimistic" default, but callers should be aware that a True return doesn't guarantee shape correctness—only that no mismatch was detected.

Consider documenting this behavior in the docstring:

         Returns:
-            bool: True if shapes match, False otherwise
+            bool: True if shapes match or could not be determined, False if a mismatch is detected
README.md (2)

218-222: EZKL artifact paths updated to match new pipeline.

The documentation now correctly reflects the new artifact layout: slice_<i>/ezkl/ containing settings.json, model.compiled, vk.key, pk.key. This addresses the previous review comment about stale paths.


381-446: Comprehensive documentation for slicing outputs and format conversion.

The new sections clearly explain:

  • Output format options (dsperse, dslice, dirs) and their effects
  • Conversion workflows with practical examples
  • Cleanup behavior and bundle structure

This significantly improves user discoverability of the new packaging features.

dsperse/src/verifier.py (2)

69-77: Critical bug from previous review is now fixed.

The code now properly checks for missing settings_path and vk_path before calling ezkl_runner.verify(), preventing the FileNotFoundError that would occur when None was passed and converted to the string "None". Good fix!


103-121: Conversion round-trip in verify_dslice and verify_dsperse may have unintended side effects.

Both methods convert to dirs, verify, then convert back to original format. The re-conversion at the end (lines 109, 119) might overwrite or modify files unexpectedly if verification added artifacts.

Verify that the round-trip conversion after verification is intentional and doesn't lose or corrupt any verification artifacts that might be written to the dirs during verify_dirs.

dsperse/src/cli/slice.py (1)

194-199: LGTM: Clear separation of slicing and packaging concerns.

The update to always emit directory-based output from the slicer is a good architectural decision. The inline comments clearly direct users to the convert sub-command for packaging, making the workflow explicit and maintainable.

dsperse/src/run/runner.py (2)

48-85: LGTM: Well-structured orchestration with clear delegation.

The run method follows a clean orchestration pattern:

  • Normalizes runtime inputs via RunnerUtils
  • Iterates through execution chain
  • Accumulates per-slice results
  • Computes final prediction with softmax
  • Persists results and handles repackaging

The error handling and delegation to utilities keep the method focused and maintainable.


190-247: LGTM: Well-structured output formatting.

The _save_inference_output method creates a comprehensive and well-organized inference output structure including:

  • Model path and prediction results
  • Execution chain with per-slice details
  • Security percentage calculation
  • Performance comparison placeholder

The nested structure with witness_execution is clear and maintainable.

dsperse/src/compile/compiler.py (3)

83-92: LGTM: Clean delegation to EZKL backend.

The _compile_model method provides a straightforward interface for compiling non-sliced models with proper validation and directory setup.


95-140: LGTM: Well-organized two-phase compilation.

The _compile_slices method implements a clean two-phase approach:

  1. Optional ONNX inference chain for calibration
  2. Per-slice EZKL compilation with metadata updates

The per-slice metadata persistence (line 138) ensures incremental progress is saved, which is good for long-running compilations.


164-176: LGTM: Smart round-trip conversion handling.

The compile method intelligently handles different packaging formats by:

  1. Detecting if the model is sliced
  2. Converting to dirs format if needed
  3. Performing compilation
  4. Converting back to original format with cleanup

This approach maintains user expectations while working with a standardized internal format.

dsperse/src/compile/utils/compiler_utils.py (5)

41-47: LGTM: Robust error handling for type detection.

The try-except block around Converter.detect_type prevents unexpected exceptions when checking arbitrary directories. This makes is_sliced_model safe to call on any user-provided path, as confirmed in previous reviews.


56-74: LGTM: Robust layer parsing with clear error handling.

The parse_layers method handles both single indices and ranges, with appropriate warning logs for invalid inputs. The use of sorted(set(...)) ensures unique, ordered results.


78-174: LGTM: Well-focused utility functions.

The path manipulation helpers provide clear, single-purpose functionality:

  • Extracting payload-relative paths
  • Adding slice prefixes
  • Validating compilation artifacts
  • Building structured EZKL metadata

Each function has a clear responsibility and includes appropriate null-safety checks.


177-227: LGTM: Comprehensive metadata persistence.

The update_slice_metadata method properly:

  • Loads existing metadata or creates new
  • Captures EZKL version and timestamp
  • Nests compilation info under compilation.ezkl
  • Includes error information when present
  • Persists atomically to disk

This provides good traceability for compilation results.


230-277: LGTM: Clean ONNX inference chain for calibration.

The run_onnx_inference_chain method implements a clear pipeline:

  • Validates input file existence
  • Iterates through slices
  • Threads output of one slice as input to the next
  • Generates calibration.json files for EZKL

Previous review comments about f-string and unused tensor were addressed.

dsperse/src/analyzers/runner_analyzer.py (5)

19-39: LGTM: Clean refactor to stateless utilities.

The transition to a stateless analyzer with static methods is a good architectural choice. The path helper methods (rel_from_payload, with_slice_prefix) are simple, focused, and reusable across the module.


194-227: LGTM: Comprehensive slice metadata construction.

The slice metadata dictionary is well-structured with:

  • Input/output shapes
  • EZKL compatibility and usage flags
  • Circuit size and dependencies
  • Conditional inclusion of circuit paths based on availability

The logic properly normalizes paths to slice_#/payload/... format for consistency.


229-313: LGTM: Well-structured execution chain construction.

The execution chain building methods demonstrate good separation of concerns:

  • build_run_metadata orchestrates the overall structure
  • _build_execution_chain handles node graph construction with proper primary/fallback paths
  • _build_circuit_slices tracks circuit usage
  • _calculate_security computes the security percentage
  • get_execution_chain provides clean public access

The logic properly handles edge cases like empty slices and trusts the computed 'ezkl' flag without redundant file checks.


386-447: LGTM: Well-focused helper methods.

The helper methods provide clear, single-purpose functionality:

  • _has_model_metadata checks for metadata presence
  • _build_from_model_metadata delegates to existing utilities
  • _build_from_per_slice_dirs handles fallback scanning with defensive error handling

The per-slice directory scanning includes appropriate try-except blocks and handles edge cases like single-slice directories.


449-520: LGTM: Robust metadata generation with proper defaulting.

The generate_run_metadata method now properly handles all inputs:

  • Defaults save_path when None (lines 501-507) - addressing previous review
  • Detects model-level metadata to avoid unnecessary conversions
  • Normalizes inputs when needed
  • Persists metadata with proper directory creation
  • Converts back to original format when applicable

The logic is defensive and handles edge cases well.

Comment on lines +48 to +52
# Prompt if not provided
if not getattr(args, 'input_path', None):
args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)')
else:
args.input_path = normalize_path(args.input_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent normalization: prompt path not normalized.

When input_path is missing, prompt_for_value is called but the result isn't normalized. The else branch normalizes, but the prompt branch doesn't:

     if not getattr(args, 'input_path', None):
-        args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)')
+        args.input_path = normalize_path(prompt_for_value('input', 'Enter input path (dsperse file or slices directory)'))
     else:
         args.input_path = normalize_path(args.input_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Prompt if not provided
if not getattr(args, 'input_path', None):
args.input_path = prompt_for_value('input', 'Enter input path (dsperse file or slices directory)')
else:
args.input_path = normalize_path(args.input_path)
# Prompt if not provided
if not getattr(args, 'input_path', None):
args.input_path = normalize_path(prompt_for_value('input', 'Enter input path (dsperse file or slices directory)'))
else:
args.input_path = normalize_path(args.input_path)
🤖 Prompt for AI Agents
In dsperse/src/cli/convert.py around lines 48 to 52, the code normalizes
args.input_path only in the else branch, but when prompting for the input path
the prompted value is not passed through normalize_path; update the prompt
branch to assign args.input_path = normalize_path(prompt_for_value('input',
'Enter input path (dsperse file or slices directory'))) (or capture the prompt
result, validate it's not None/empty, then call normalize_path before
assignment) so both code paths produce a normalized path.

results = verifier.verify(run_path, slices_dir)

# Display results
print(f"\nVerification completed!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefix.

Line 171 uses an f-string with no placeholders.

-    print(f"\nVerification completed!")
+    print("\nVerification completed!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"\nVerification completed!")
print("\nVerification completed!")
🧰 Tools
🪛 Ruff (0.14.5)

171-171: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In dsperse/src/verifier.py around line 171, the print call uses an unnecessary
f-string prefix despite having no placeholders; remove the leading f before the
string so it becomes a normal string literal (e.g., print("\nVerification
completed!")), keeping the newline and message intact.

…, streamline function signatures, improve inference logic, and simplify slice processing.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (15)
dsperse/src/run/utils/runner_utils.py (3)

33-53: Silent exception swallowing in normalize_for_runtime hides root causes.

Both dsperse and dslice conversion blocks catch bare Exception and silently fall back to a default path. This makes debugging difficult when conversion fails unexpectedly. Consider logging the exception before falling back.

         if packaging_type == "dsperse":
             try:
                 converted = Converter.convert(source_path, output_type="dirs", cleanup=False)
                 return Path(converted), "dsperse", model_path
-            except Exception:
+            except Exception as e:
+                logger.warning(f"Failed to convert dsperse source: {e}")
                 return (model_path / "slices").resolve(), None, model_path
 
         if packaging_type == "dslice":
             try:
                 sp = Path(source_path)
                 if sp.is_file():
                     # Extract single .dslice file to a slice_* directory; use its parent as slices root
                     slice_dir = Path(Converter.convert(str(sp), output_type="dirs", cleanup=False))
                     slices_root = slice_dir.parent
                 else:
                     # Directory containing .dslice files; expand into slice_* directories in place
                     expanded_dir = Path(Converter.convert(str(sp), output_type="dirs", cleanup=False))
                     slices_root = expanded_dir
                 return slices_root, "dslice", model_path
-            except Exception:
+            except Exception as e:
+                logger.warning(f"Failed to convert dslice source: {e}")
                 return (model_path / "slices").resolve(), None, model_path

101-102: Fix type hints for Optional parameters.

Per PEP 484, model_directory: str = None should be typed as Optional[str]. Although the past review noted these parameters are reserved for deferred reshape functionality, the type hint should still be correct.

-    def preprocess_input(input_path:str, model_directory: str = None, save_reshape: bool = False) -> torch.Tensor:
+    def preprocess_input(input_path: str, model_directory: Optional[str] = None, save_reshape: bool = False) -> torch.Tensor:

79-94: execute_slice tightly couples to Runner's internal methods.

This helper accepts a runner object and calls private methods like runner._run_ezkl_slice and runner.run_onnx_slice. This creates a bidirectional dependency between RunnerUtils and Runner. Consider passing the required callables directly or moving this logic into the Runner class itself.

dsperse/src/slice/utils/converter.py (2)

19-20: Use explicit Optional type hints.

Multiple methods use implicit Optional (e.g., output_path: str = None). Per PEP 484, this should be Optional[str].

     @staticmethod
-    def convert(path: str, output_type: str = "dirs", output_path: str = None, cleanup: bool = True):
+    def convert(path: str, output_type: str = "dirs", output_path: Optional[str] = None, cleanup: bool = True) -> str:

Apply similar changes to _dirs_to_dslice, _dirs_to_dsperse, _dslice_to_dirs, _dsperse_to_dirs, and _zip_directory. You'll need to add from typing import Optional if not already present (it's imported via __future__ annotations but explicit import is cleaner).


346-361: Rename unused loop variable and clarify pattern matching logic.

The dirs variable from os.walk is unused. Additionally, the pattern matching logic at line 357 using pattern.replace('*', '') is fragile for patterns like *_suffix.

-        for root, dirs, files in os.walk(source_dir):
+        for root, _dirs, files in os.walk(source_dir):
             root_path = Path(root)
 
             # Check if we should skip this directory
             if exclude_patterns:
                 rel_root = root_path.relative_to(source_dir)
                 skip = False
                 for pattern in exclude_patterns:
-                    if pattern.startswith('*'):
-                        # Match file extension patterns
-                        pass  # We'll check files individually
-                    elif str(rel_root).startswith(pattern.replace('*', '')):
+                    # Skip wildcard-only patterns (handled at file level)
+                    if pattern.startswith('*'):
+                        continue
+                    # Handle prefix patterns like 'slice_*'
+                    prefix = pattern.rstrip('*')
+                    if prefix and str(rel_root).startswith(prefix):
                         skip = True
                         break
dsperse/src/run/runner.py (4)

74-85: Duplicate _resolve_rel_path helper functions.

Two nearly identical _resolve_rel_path functions are defined as nested functions in run_onnx_slice (lines 74-85) and _run_ezkl_slice (lines 107-111). These should be consolidated into a single helper method on the class or in RunnerUtils.

+    @staticmethod
+    def _resolve_rel_path(p: str, base_dir: Path, check_parent: bool = False) -> str | None:
+        """Resolve a possibly relative path under base_dir."""
+        if not p:
+            return None
+        p_str = str(p)
+        if os.path.isabs(p_str):
+            return p_str
+        resolved = (Path(base_dir) / p_str).resolve()
+        if check_parent and not resolved.exists():
+            resolved = (Path(base_dir).parent / Path(p_str)).resolve()
+        return str(resolved)

Then use Runner._resolve_rel_path(onnx_path, slice_dir) in both methods.

Also applies to: 107-111


45-47: Use a custom exception instead of bare Exception.

Raising a bare Exception makes it harder for callers to catch specific error conditions. Consider defining a custom exception class like SlicePathError or using ValueError/FileNotFoundError.

         if slice_path is None or not Path(slice_path).exists():
-            raise Exception("A valid path must be provided for slices")
+            raise FileNotFoundError(f"Slice path does not exist or was not provided: {slice_path}")

246-248: Use a custom exception for inference failures.

Similar to the earlier issue, raising bare Exception reduces error handling granularity.

         if not ok:
-            raise Exception(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")
+            raise RuntimeError(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")

24-24: Multiple implicit Optional type hints.

Several parameters use = None without Optional type annotation. This violates PEP 484.

-    def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None, slice_path: str | None = None):
+    def __init__(self, run_metadata_path: Optional[str] = None, save_metadata_path: Optional[str] = None, slice_path: Optional[str] = None):

-    def run(self, input_json_path, slice_path: str, output_path: str = None) -> dict:
+    def run(self, input_json_path: str, slice_path: str, output_path: Optional[str] = None) -> dict:

-    def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Path = None):
+    def run_onnx_slice(slice_info: dict, input_tensor_path, output_tensor_path, slice_dir: Optional[Path] = None):

-    def _run_ezkl_slice(self, slice_info: dict, input_tensor_path, output_witness_path, slice_dir: Path = None):
+    def _run_ezkl_slice(self, slice_info: dict, input_tensor_path, output_witness_path, slice_dir: Optional[Path] = None):

Add from typing import Optional at the top if needed.

Also applies to: 39-39, 67-67, 102-102

dsperse/src/compile/compiler.py (4)

31-38: Stale docstring references non-existent parameter.

The docstring mentions compiler_impl as an argument, but the constructor takes no parameters. Update the docstring to reflect the actual implementation.

     def __init__(self):
         """
-        Initialize the Compiler with a specific implementation.
-
-        Args:
-            compiler_impl: The compiler implementation to use
+        Initialize the Compiler with the EZKL backend.
         """
         self.ezkl = EZKL()

164-164: Variable type shadows built-in.

Using type as a variable name shadows Python's built-in type() function. Rename to slice_type or format_type.

-        is_sliced, slice_path, type = CompilerUtils.is_sliced_model(model_path)
+        is_sliced, slice_path, slice_type = CompilerUtils.is_sliced_model(model_path)
         if is_sliced:
             # Convert to dirs if needed
-            if type != "dirs":
+            if slice_type != "dirs":
                 slice_path = Converter.convert(model_path, output_type="dirs", cleanup=True)

             self._compile_slices(slice_path, input_file_path=input_file, layer_indices=layer_indices)

             # Convert back to original type if needed
-            if type != "dirs":
-                slice_path = Converter.convert(slice_path, output_type=type, cleanup=True)
+            if slice_type != "dirs":
+                slice_path = Converter.convert(slice_path, output_type=slice_type, cleanup=True)

54-55: Remove extraneous f prefix from strings without placeholders.

These f-strings contain no interpolation placeholders.

-            logger.error(f"No valid path found for slice")
-            raise FileNotFoundError(f"No valid path found for slice")
+            logger.error("No valid path found for slice")
+            raise FileNotFoundError("No valid path found for slice")

         calibration_input = os.path.join(
             os.path.dirname(slice_path),
             "ezkl",
-            f"calibration.json"
+            "calibration.json"
         ) if os.path.exists(os.path.join(os.path.dirname(slice_path), "ezkl", "calibration.json")) else None

Also applies to: 62-62


125-138: Metadata saved after each slice — consider batching.

Utils.save_metadata_file is called inside the loop (line 138), writing metadata to disk after every slice compilation. For models with many slices, this could be slow. Consider saving once after the loop completes, or documenting this as intentional for crash recovery.

dsperse/src/analyzers/runner_analyzer.py (2)

289-364: Clarify or remove _normalize_to_dirs (unused, inconsistent contract)

_normalize_to_dirs has a fairly rich docstring and behavior, but:

  • It’s not called anywhere in this file anymore.
  • The docstring promises (model_root, slices_dir, original_format) while the implementation returns only (slices_dir, original_format).
  • model_root is computed but never used.
  • Several broad except Exception handlers swallow errors without logging.

If the new API no longer needs to accept .dslice / .dsperse directly, this helper is dead code and can be removed to avoid confusion.

If you do still intend generate_run_metadata to accept packaged formats, I’d recommend:

  • Wiring _normalize_to_dirs into generate_run_metadata and fixing its return type/contract.
  • Tightening exception handling and logging (e.g., log the exception instead of bare pass in the iterdir branches).

Example of integrating it at the top of generate_run_metadata:

slices_dir, normalized_format = RunnerAnalyzer._normalize_to_dirs(slice_path)
if original_format is None:
    original_format = normalized_format
# then use slices_dir instead of slice_path below

Right now the presence of this function plus the original_format parameter is misleading because they’re not actually used together.


381-424: Avoid raising bare Exception when no slices are found

In _build_from_per_slice_dirs:

if not slices_data:
    raise Exception(f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json.")

Raising a generic Exception makes it harder for callers to distinguish this from other failure modes. A more specific type (e.g., ValueError or a custom NoSlicesFoundError) would improve error handling and testability.

For example:

-        if not slices_data:
-            raise Exception(f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json.")
+        if not slices_data:
+            raise ValueError(
+                f"No valid slices found under {slices_dir}. Each slice directory must include metadata.json."
+            )

Callers that want to treat “no slices” as a fatal configuration error can then catch this specific exception.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7f16c4 and 320ee4b.

📒 Files selected for processing (7)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
  • dsperse/src/compile/compiler.py (1 hunks)
  • dsperse/src/compile/utils/compiler_utils.py (1 hunks)
  • dsperse/src/run/runner.py (1 hunks)
  • dsperse/src/run/utils/runner_utils.py (1 hunks)
  • dsperse/src/slice/slicer.py (4 hunks)
  • dsperse/src/slice/utils/converter.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
dsperse/src/compile/compiler.py (4)
dsperse/src/backends/ezkl.py (3)
  • EZKL (107-693)
  • compilation_pipeline (578-671)
  • get_version (137-158)
dsperse/src/compile/utils/compiler_utils.py (7)
  • CompilerUtils (12-285)
  • is_ezkl_compilation_successful (105-113)
  • get_relative_paths (116-129)
  • update_slice_metadata (178-235)
  • run_onnx_inference_chain (239-285)
  • parse_layers (57-74)
  • is_sliced_model (15-54)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • convert (20-90)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • find_metadata_path (45-65)
  • save_metadata_file (18-42)
dsperse/src/run/runner.py (6)
dsperse/src/analyzers/runner_analyzer.py (3)
  • RunnerAnalyzer (16-460)
  • generate_run_metadata (427-460)
  • get_execution_chain (275-278)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • generate_witness (160-242)
dsperse/src/backends/onnx_models.py (2)
  • OnnxModels (13-222)
  • run_inference (18-60)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-330)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • detect_type (93-122)
  • convert (20-90)
dsperse/src/utils/utils.py (3)
  • Utils (12-359)
  • read_input (134-171)
  • write_input (127-131)
dsperse/src/slice/slicer.py (2)
dsperse/src/slice/onnx_slicer.py (3)
  • slice (326-457)
  • OnnxSlicer (16-519)
  • slice_model (499-519)
dsperse/src/slice/utils/converter.py (2)
  • Converter (17-382)
  • convert (20-90)
dsperse/src/analyzers/runner_analyzer.py (3)
dsperse/src/utils/utils.py (2)
  • Utils (12-359)
  • save_metadata_file (18-42)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • detect_type (93-122)
  • convert (20-90)
dsperse/src/cli/convert.py (1)
  • convert (44-98)
dsperse/src/compile/utils/compiler_utils.py (2)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • get_version (137-158)
dsperse/src/slice/utils/converter.py (1)
  • detect_type (93-122)
🪛 Ruff (0.14.6)
dsperse/src/compile/compiler.py

52-52: Avoid specifying long messages outside the exception class

(TRY003)


54-54: f-string without any placeholders

Remove extraneous f prefix

(F541)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


55-55: f-string without any placeholders

Remove extraneous f prefix

(F541)


62-62: f-string without any placeholders

Remove extraneous f prefix

(F541)


85-85: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Avoid specifying long messages outside the exception class

(TRY003)


181-181: Avoid specifying long messages outside the exception class

(TRY003)


203-203: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/run/runner.py

24-24: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


24-24: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


39-39: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


46-46: Create your own exception

(TRY002)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


67-67: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


102-102: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


135-135: Do not catch blind exception: Exception

(BLE001)


180-180: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


247-247: Create your own exception

(TRY002)


247-247: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/slice/utils/converter.py

20-20: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


35-35: Avoid specifying long messages outside the exception class

(TRY003)


37-37: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Do not catch blind exception: Exception

(BLE001)


88-88: Do not catch blind exception: Exception

(BLE001)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


122-122: Avoid specifying long messages outside the exception class

(TRY003)


135-135: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


165-165: Do not catch blind exception: Exception

(BLE001)


170-170: Avoid specifying long messages outside the exception class

(TRY003)


195-195: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


211-211: Avoid specifying long messages outside the exception class

(TRY003)


264-264: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


287-287: Avoid specifying long messages outside the exception class

(TRY003)


306-306: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


313-313: Avoid specifying long messages outside the exception class

(TRY003)


341-341: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


346-346: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)


437-437: Do not catch blind exception: Exception

(BLE001)

dsperse/src/run/utils/runner_utils.py

37-37: Do not catch blind exception: Exception

(BLE001)


51-51: Consider moving this statement to an else block

(TRY300)


52-52: Do not catch blind exception: Exception

(BLE001)


102-102: Unused static method argument: model_directory

(ARG004)


102-102: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


102-102: Unused static method argument: save_reshape

(ARG004)


125-125: Avoid specifying long messages outside the exception class

(TRY003)


136-136: Prefer TypeError exception for invalid type

(TRY004)


136-136: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/slice/slicer.py

70-70: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


73-73: Unused static method argument: slices_dir

(ARG004)


112-112: Function definition does not bind loop variable slice_key

(B023)


140-140: Unused static method argument: slices_dir

(ARG004)


154-154: Do not catch blind exception: Exception

(BLE001)


314-314: Do not catch blind exception: Exception

(BLE001)


323-324: try-except-pass detected, consider logging the exception

(S110)


323-323: Do not catch blind exception: Exception

(BLE001)


334-334: Do not catch blind exception: Exception

(BLE001)


349-349: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


349-349: Useless if-else condition

(RUF034)


376-377: try-except-pass detected, consider logging the exception

(S110)


376-376: Do not catch blind exception: Exception

(BLE001)


413-413: Create your own exception

(TRY002)


413-413: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
dsperse/src/run/utils/runner_utils.py (1)

1-335: LGTM overall — past review issues have been addressed.

The empty-list handling (Line 124-125), directory guard in save_to_file_shaped (Lines 179-181), filter_tensor cleanup (Lines 251-255), and __main__ fix (Line 335) from previous reviews have all been properly implemented.

dsperse/src/slice/slicer.py (1)

27-55: Clean dependency injection pattern.

The constructor accepting slicer_impl and the factory method create() provide a clean separation. This will make it easier to add support for other model types in the future.

dsperse/src/run/runner.py (1)

39-64: Good format detection and cleanup pattern.

The run method properly detects the input format, converts to dirs for processing, and converts back to the original format afterward. This ensures the user's original packaging is preserved.

dsperse/src/compile/compiler.py (1)

143-182: Solid compile orchestration with format normalization.

The compile method properly detects sliced models, converts to dirs format for processing, compiles, and converts back to the original format. The layer filtering via parse_layers is a nice feature for selective compilation.

dsperse/src/slice/utils/converter.py (1)

378-382: I'll help you verify the security concern about zipfile.extractall in the converter.py file. Let me start by examining the code context and understanding how this method is being used.
<function_calls>

#!/bin/bash

First, let's read the file to see the full context around the _unzip_file method

wc -l dsperse/src/slice/utils/converter.py


</function_calls>
<function_calls>

#!/bin/bash

Read the specific section around lines 378-382

sed -n '360,400p' dsperse/src/slice/utils/converter.py


</function_calls>
<function_calls>

#!/bin/bash

Search for callers of _unzip_file to understand usage patterns

rg -n "_unzip_file" dsperse/src/slice/utils/converter.py -A 5 -B 5


</function_calls>
<function_calls>

#!/bin/bash

Search for any callers of _unzip_file across the entire codebase

rg -n "_unzip_file" --type python


</function_calls>
<function_calls>

#!/bin/bash

Search for extractall usage across the codebase to see if there are any other implementations

rg -n "extractall" --type python -B 3 -A 3


</function_calls>

Now let me research current Python best practices and vulnerability information:
<function_calls>

Python zipfile.extractall path traversal vulnerability zip slip security

</function_calls>
<function_calls>

Python zipfile extractall safe extraction validation CVE

</function_calls>

Comment on lines +427 to +461
def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
"""
Build run-metadata from a slices source (dirs/.dslice/.dsperse) and save it.
- Normalizes inputs to dirs temporarily when needed (no cleanup).
- Prefers model-level slices/metadata.json when present; otherwise scans per-slice dirs.
- Emits paths normalized as 'slice_#/payload/...'.
- Saves to save_path or default '<parent_of_slice_path>/run/metadata.json'.
- Converts back to original packaging when original_format != 'dirs'.
Returns the run-metadata dict.
"""

if RunnerAnalyzer._has_model_metadata(slice_path):
run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
else:
run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)

# Attach packaging metadata
run_meta["packaging_type"] = original_format
run_meta["source_path"] = str(slice_path)

# Save
if save_path is None:
base = Path(slice_path).resolve()
# If slice_path is a model root or slices dir, put run/metadata.json next to it
base_dir = base if base.is_dir() else base.parent
save_path = base_dir / "run" / "metadata.json"
else:
save_path = Path(save_path).resolve()

save_path.parent.mkdir(parents=True, exist_ok=True)
run_meta["run_directory"] = str(save_path.parent)
Utils.save_metadata_file(run_meta, save_path)

return run_meta

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

generate_run_metadata currently crashes for string paths (including the __main__ demo)

generate_run_metadata is annotated to take a Path, but it’s often convenient (and natural) to pass a string. The __main__ block does exactly that:

model_path = Path(model_dir).resolve()
...
out = RunnerAnalyzer.generate_run_metadata(str(model_path))

Inside generate_run_metadata, you immediately call:

if RunnerAnalyzer._has_model_metadata(slice_path):
    ...

and _has_model_metadata uses the / operator:

return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not (
            path / "payload").exists()

When slice_path is a str, path / "metadata.json" raises TypeError: unsupported operand type(s) for /: 'str' and 'str'. So the demo block — and any other caller passing a string — will currently crash.

You can make this API robust (and keep it ergonomic) by normalizing slice_path to a Path at the top and broadening the type annotation:

-    @staticmethod
-    def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
+    @staticmethod
+    def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None):
@@
-        if RunnerAnalyzer._has_model_metadata(slice_path):
-            run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
-        else:
-            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
+        # Normalize to Path for downstream helpers
+        slice_path = Path(slice_path)
+
+        if RunnerAnalyzer._has_model_metadata(slice_path):
+            run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
+        else:
+            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
@@
-        run_meta["packaging_type"] = original_format
-        run_meta["source_path"] = str(slice_path)
+        run_meta["packaging_type"] = original_format
+        run_meta["source_path"] = str(slice_path.resolve())
@@
-        if save_path is None:
-            base = Path(slice_path).resolve()
+        if save_path is None:
+            base = slice_path.resolve()
@@
-    model_dir = base_paths[model_choice]
+    model_dir = base_paths[model_choice]
@@
-    out = RunnerAnalyzer.generate_run_metadata(str(model_path))
+    out = RunnerAnalyzer.generate_run_metadata(model_path)

This fixes the immediate crash and makes the public API tolerant of both str and Path inputs, which is consistent with how other utilities in this codebase behave.

Also applies to: 462-473

Comment on lines +116 to +129
def get_relative_paths(compilation_data: Dict[str, Any], calibration_input: Optional[str]) -> dict[str, str | None]:
"""
Compute payload-relative paths for compiled artifacts and the calibration file.
Returns a tuple of (payload_rel_dict, calibration_rel_path).
"""
calibration_rel = CompilerUtils._rel_from_payload(calibration_input) if calibration_input and os.path.exists(calibration_input) else None
relative_paths = {
'settings': CompilerUtils._rel_from_payload(compilation_data.get('settings')),
'compiled': CompilerUtils._rel_from_payload(compilation_data.get('compiled')),
'vk_key': CompilerUtils._rel_from_payload(compilation_data.get('vk_key')),
'pk_key': CompilerUtils._rel_from_payload(compilation_data.get('pk_key')),
'calibration': calibration_rel
}
return relative_paths
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix get_relative_paths docstring to match return type

The docstring still claims this returns a tuple (payload_rel_dict, calibration_rel_path), but the implementation returns a single dict with a "calibration" key. This can confuse callers and future maintainers.

I suggest updating the docstring to describe the actual dict shape:

-        """
-        Compute payload-relative paths for compiled artifacts and the calibration file.
-        Returns a tuple of (payload_rel_dict, calibration_rel_path).
-        """
+        """
+        Compute payload-relative paths for compiled artifacts and the optional calibration file.
+
+        Returns:
+            A dict mapping keys ('settings', 'compiled', 'vk_key', 'pk_key', 'calibration')
+            to payload-relative paths or None.
+        """
🤖 Prompt for AI Agents
In dsperse/src/compile/utils/compiler_utils.py around lines 116 to 129, the
docstring incorrectly states the function returns a tuple; update it to describe
the actual return value (a single dict) and its shape: state that the function
computes payload-relative paths and returns a dict with keys 'settings',
'compiled', 'vk_key', 'pk_key' mapping to str or None, and 'calibration' mapping
to str or None (depending on existence), matching the function signature
dict[str, str | None].

Comment on lines +239 to +285
def run_onnx_inference_chain(slices_data: list, base_path: str, input_file_path: Optional[str] = None):
"""
Phase 1: Run ONNX inference chain to generate calibration files.
Args:
slices_data: List of slice metadata
base_path: Base path for relative file paths
input_file_path: Path to the initial input file
"""
current_input = input_file_path
if current_input and os.path.exists(current_input):
logger.info("Running ONNX inference chain to generate calibration files")
for idx, slice_data in enumerate(slices_data):
slice_path = slice_data.get('path')
# First try full path
if slice_path and os.path.exists(slice_path):
pass # Use the full path
# Then try relative path
elif slice_data.get('relative_path'):
slice_path = os.path.join(base_path, slice_data.get('relative_path'))
if not os.path.exists(slice_path):
logger.warning(f"Slice file not found for index {idx}: {slice_path}")
continue
else:
logger.error(f"No valid path found for slice index {idx}")
continue

slice_output_path = os.path.join(os.path.dirname(slice_path), "ezkl")
os.makedirs(slice_output_path, exist_ok=True)

# Run ONNX inference to generate calibration data
output_tensor_path = os.path.join(slice_output_path, "calibration.json")
logger.info(f"Running ONNX inference for slice {idx} with input file {current_input}")
success, _tensor, exec_info = Runner.run_onnx_slice(
slice_info={"path": slice_path},
input_tensor_path=Path(current_input),
output_tensor_path=Path(output_tensor_path)
)

if not success:
logger.error(f"ONNX inference failed for slice {idx}: {exec_info.get('error', 'Unknown error')}")
return

current_input = output_tensor_path
logger.info(f"Generated calibration file: {output_tensor_path}")
else:
logger.warning("No input file provided, skipping ONNX inference chain")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Differentiate “no input provided” vs “input path missing” in ONNX chain

run_onnx_inference_chain currently treats a provided-but-missing input_file_path the same as no input at all, logging “No input file provided” and silently skipping the chain. That makes configuration errors harder to diagnose.

You can make the behavior clearer (and fail fast) by splitting the two cases:

-        current_input = input_file_path
-        if current_input and os.path.exists(current_input):
-            logger.info("Running ONNX inference chain to generate calibration files")
-            for idx, slice_data in enumerate(slices_data):
+        current_input = input_file_path
+        if not current_input:
+            logger.warning("No input file provided, skipping ONNX inference chain")
+            return
+
+        if not os.path.exists(current_input):
+            logger.error(f"Input file for ONNX inference chain not found: {current_input}")
+            return
+
+        logger.info("Running ONNX inference chain to generate calibration files")
+        for idx, slice_data in enumerate(slices_data):
@@
-        else:
-            logger.warning("No input file provided, skipping ONNX inference chain")

This preserves existing success-path behavior while giving a more accurate message (and early exit) for a bad path.

🤖 Prompt for AI Agents
In dsperse/src/compile/utils/compiler_utils.py around lines 239 to 285, the
function treats a provided-but-missing input_file_path the same as no input and
logs "No input file provided"; change the branching to (1) if input_file_path is
None/empty: log a warning and skip the ONNX chain (preserve current behavior),
(2) else if input_file_path is provided but os.path.exists(input_file_path) is
False: log an error stating the input path is missing and return/fail fast, and
(3) otherwise proceed with the existing inference loop using the validated path;
keep messages descriptive and do not swallow configuration errors.

Comment on lines +69 to +78
if not output_path:
raise ValueError("output_path must be provided for slicing")

logger.info(f"Slicing model to output path: {output_path}")
return self.slicer_impl.slice_model(output_path=output_path)
result = self.slicer_impl.slice_model(output_path=output_path)

if output_type != "dirs":
Converter.convert(output_path, output_type)

return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Converter.convert return value is discarded.

When output_type != "dirs", the converter is called but its return value (the path to the converted output) is not captured or returned. If the conversion moves/renames files, the caller won't know the new location. Consider returning the converted path or documenting that the output is modified in-place at output_path.

         if output_type != "dirs":
-            Converter.convert(output_path, output_type)
+            converted_path = Converter.convert(output_path, output_type)
+            logger.info(f"Converted output to {output_type} at: {converted_path}")
 
         return result

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.6)

70-70: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In dsperse/src/slice/slicer.py around lines 69 to 78, the return value from
Converter.convert is discarded when output_type != "dirs"; capture the
converter's returned path and return it (falling back to the original slicing
result if convert returns None). Concretely, call converted =
Converter.convert(output_path, output_type) and then return converted if
converted is truthy, otherwise return the existing result; ensure any
docstring/comments reflect that the method may return the converted path.

Comment on lines +58 to +91
elif current_type == "dslice":
if output_type == "dirs":
return Converter._dslice_to_dirs(path_obj, output_path)
elif output_type == "dsperse":
# Convert dslice -> dirs -> dsperse
temp_dirs = Converter._dslice_to_dirs(path_obj, None)
result = Converter._dirs_to_dsperse(Path(temp_dirs), output_path)
# Optionally clean up temp dirs
return result

elif current_type == "dsperse":
if output_type == "dirs":
# Extract and expand contained .dslice files into slice_* directories
result = Converter._dsperse_to_dirs(path_obj, output_path, expand_slices=True)
if cleanup:
try:
path_obj.unlink()
logger.info(f"Removed source dsperse file: {path_obj}")
except Exception as e:
logger.warning(f"Could not remove source dsperse file {path_obj}: {e}")
return result
elif output_type == "dslice":
# Convert dsperse -> dirs (without expanding slices)
temp_dirs = Converter._dsperse_to_dirs(path_obj, None, expand_slices=False)
# The dsperse already contains dslices, so just return the directory
logger.info(f"Extracted dsperse to directory with dslice files: {temp_dirs}")
if cleanup:
try:
path_obj.unlink()
logger.info(f"Removed source dsperse file: {path_obj}")
except Exception as e:
logger.warning(f"Could not remove source dsperse file {path_obj}: {e}")
return temp_dirs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

convert() lacks explicit return for some code paths.

When current_type == "dslice" and output_type == "dsperse", the function returns the result of _dirs_to_dsperse. However, if current_type doesn't match any branch, the function returns None implicitly. Consider adding a fallback or raising an error for unhandled type combinations.

         elif current_type == "dsperse":
             if output_type == "dirs":
                 # ...
                 return result
             elif output_type == "dslice":
                 # ...
                 return temp_dirs
+
+        # Should not reach here if types are valid
+        raise ValueError(f"Unsupported conversion: {current_type} -> {output_type}")
🧰 Tools
🪛 Ruff (0.14.6)

76-76: Do not catch blind exception: Exception

(BLE001)


88-88: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In dsperse/src/slice/utils/converter.py around lines 58 to 91, the convert()
branching for current_type/output_type combinations can fall through and return
None for unhandled combinations; add an explicit fallback that raises a clear
exception (e.g., ValueError or NotImplementedError) including the current_type
and output_type values so callers get a deterministic error instead of None, and
ensure any temporary-path branches consistently return the same type (Path or
str) as the function contract expects.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
dsperse/src/analyzers/runner_analyzer.py (3)

53-70: process_slices dispatch is brittle and will crash on empty input

The current dispatch:

# Normalize input to a list
if len(slices_data[0]) == 3:
    if isinstance(slices_data, dict):
        slices_data = [slices_data]
    return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data)
else:
    return RunnerAnalyzer._process_slices_model(slices_dir, slices_data)

has a few problems:

  • len(slices_data[0]) will raise IndexError when slices_data is empty.
  • The isinstance(slices_data, dict) branch is effectively dead, since you index slices_data[0] before that check.
  • Using len(...) == 3 as a structural discriminator is fragile; adding a field to those dicts will silently flip the dispatch.

A more robust approach is to normalize dict→list, guard emptiness, and dispatch based on a distinguishing key such as "slice_metadata":

     @staticmethod
     def process_slices(slices_dir, slices_data):
@@
-        # Normalize input to a list
-        if len(slices_data[0]) == 3:
-            if isinstance(slices_data, dict):
-                slices_data = [slices_data]
-            return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data)
-        else:
-            return RunnerAnalyzer._process_slices_model(slices_dir, slices_data)
+        # Normalize input and dispatch based on structure
+        if isinstance(slices_data, dict):
+            slices_data = [slices_data]
+
+        if not slices_data:
+            return {}
+
+        first = slices_data[0]
+        # Per-slice entries built in _build_from_per_slice_dirs carry 'slice_metadata'
+        if isinstance(first, dict) and "slice_metadata" in first:
+            return RunnerAnalyzer._process_slices_per_slice(slices_dir, slices_data)
+
+        # Otherwise treat as model-level metadata entries from slices/metadata.json.
+        return RunnerAnalyzer._process_slices_model(slices_dir, slices_data)

This avoids crashes on empty input and makes the routing logic resilient to future metadata changes.


139-205: Guard _process_slices_per_slice against missing metadata to avoid TypeError joins

In the per‑slice path, several os.path.join calls assume all metadata fields are present:

parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0]
...
onnx_path = os.path.join(parent_dir, slice.get("relative_path"))
...
if files:
    circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled"))
    settings_path = os.path.join(parent_dir, files.get("settings"))
    pk_path = os.path.join(parent_dir, files.get("pk_key"))
    vk_path = os.path.join(parent_dir, files.get("vk_key"))

If any of slice.get("relative_path") or the files[...] values are missing/None (e.g. pre‑compilation, partial metadata, or failed compilation), os.path.join(parent_dir, None) will raise TypeError, which contradicts the earlier goal that the analyzer “just analyzes what is there” and tolerates incomplete data.

You can mirror the more defensive pattern from _process_slices_model and gate joins on presence:

-            meta_path = entry.get("slice_metadata")
-            parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0]
+            meta_path = entry.get("slice_metadata")
+            # Entry path is already normalized to "slice_X/payload/..."; keep only the slice dir.
+            parent_dir = os.path.dirname(entry.get("path") or "").split(os.sep)[0]
@@
-            # Normalize ONNX path
-            onnx_path = os.path.join(parent_dir, slice.get("relative_path"))
+            # Normalize ONNX path; tolerate missing fields
+            rel_from_meta = slice.get("relative_path") or slice.get("path")
+            onnx_path = os.path.join(parent_dir, rel_from_meta) if rel_from_meta else None
@@
-            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
-            files = (comp.get("files") or {})
-            compiled_flag = bool(comp.get("compiled", False))
-
-            if files:
-                circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled"))
-                settings_path = os.path.join(parent_dir, files.get("settings"))
-                pk_path = os.path.join(parent_dir, files.get("pk_key"))
-                vk_path = os.path.join(parent_dir, files.get("vk_key"))
-            else:
-                circuit_path = None
-                settings_path = None
-                pk_path = None
-                vk_path = None
+            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
+            files = (comp.get("files") or {})
+            compiled_flag = bool(comp.get("compiled", False))
+
+            circuit_path = None
+            settings_path = None
+            pk_path = None
+            vk_path = None
+            if files:
+                compiled_rel = files.get("compiled_circuit") or files.get("compiled")
+
+                def _join_optional(key: str) -> Optional[str]:
+                    rel = files.get(key)
+                    return os.path.join(parent_dir, rel) if rel else None
+
+                circuit_path = (
+                    os.path.join(parent_dir, compiled_rel) if compiled_rel else None
+                )
+                settings_path = _join_optional("settings")
+                pk_path = _join_optional("pk_key")
+                vk_path = _join_optional("vk_key")

The rest of the pipeline (execution_chain and security) already tolerates None for these paths, so this makes the analyzer robust to partially‑populated per‑slice metadata.


371-384: generate_run_metadata path handling: crash on string paths and model‑root confusion

There are two related issues here:

  1. String paths crash via _has_model_metadata

    generate_run_metadata is annotated to take a Path, but both the __main__ demo and likely some callers pass a string:

    out = RunnerAnalyzer.generate_run_metadata(str(model_path))

    Inside, you immediately call:

    if RunnerAnalyzer._has_model_metadata(slice_path):

    and _has_model_metadata does:

    return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not (
                path / "payload").exists()

    When slice_path is a str, path / "metadata.json" raises TypeError: unsupported operand type(s) for /: 'str' and 'str'. So the __main__ block — and any other string‑based caller — will currently crash.

  2. Model‑root vs slices‑dir mismatch

    _has_model_metadata returns True when either path/metadata.json or path/slices/metadata.json exists. For a model root that only has slices/metadata.json, _has_model_metadata(model_root) returns True, but _build_from_model_metadata then calls load_slices_metadata(slices_dir) on that same model_root, trying to open model_root/metadata.json, which will fail. The demo currently passes the model root (../models/doom) rather than the slices dir, so even after normalizing the type, this would still be incorrect.

You can fix both by:

  • Normalizing slice_path to a Path immediately.
  • Resolving whether you were given a model root or a slices directory, and passing the actual slices dir into _has_model_metadata / _build_from_model_metadata.
  • Making _has_model_metadata itself tolerant of str | Path.

For example:

-    @staticmethod
-    def _has_model_metadata(path: Path) -> bool:
-        return ((path / "metadata.json").exists() or (path / "slices" / "metadata.json").exists()) and not (
-                    path / "payload").exists()
+    @staticmethod
+    def _has_model_metadata(path: Path | str) -> bool:
+        path = Path(path)
+        return (
+            (path / "metadata.json").exists()
+            or (path / "slices" / "metadata.json").exists()
+        ) and not (path / "payload").exists()
@@
-    @staticmethod
-    def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
+    @staticmethod
+    def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None):
@@
-        if RunnerAnalyzer._has_model_metadata(slice_path):
-            run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
-        else:
-            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
+        # Normalize the input and distinguish model root vs slices directory
+        slice_path = Path(slice_path)
+        if (slice_path / "slices" / "metadata.json").exists():
+            slices_dir = slice_path / "slices"
+        else:
+            slices_dir = slice_path
+
+        if RunnerAnalyzer._has_model_metadata(slices_dir):
+            run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir)
+        else:
+            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir)
@@
-        run_meta["packaging_type"] = original_format
-        run_meta["source_path"] = str(slice_path)
+        run_meta["packaging_type"] = original_format
+        run_meta["source_path"] = str(slice_path.resolve())
@@
-        if save_path is None:
-            base = Path(slice_path).resolve()
+        if save_path is None:
+            base = slice_path.resolve()
@@
-if __name__ == "__main__":
+if __name__ == "__main__":
@@
-    out = RunnerAnalyzer.generate_run_metadata(str(model_path))
+    out = RunnerAnalyzer.generate_run_metadata(model_path)

This keeps the API ergonomic (accepting both str and Path), allows callers to pass either a model root or a slices directory, and makes the __main__ smoke test actually executable.

Also applies to: 433-466, 468-479

🧹 Nitpick comments (5)
dsperse/src/analyzers/runner_analyzer.py (5)

16-40: Stateless design and path helpers look good (minor nit on __init__)

The stateless, static‑method design plus rel_from_payload / with_slice_prefix helpers reads clean and matches the new per‑slice payload layout. The explicit __init__ that just passes is technically unnecessary though; you could drop it entirely to emphasize that this class is not meant to be instantiated.


42-50: Broader exception handling in load_slices_metadata could be tightened

Catching a bare Exception and logging via logger.error before re‑raising loses stack context and may hide the specific failure mode (e.g., file not found vs JSON decode). If you care about diagnosability, consider narrowing to OSError | json.JSONDecodeError and using logger.exception so the traceback is preserved in logs.


72-137: Model‑metadata pathing logic is sound; slices_dir arg is currently unused

The normalization to slice_#/payload/... via rel_from_payload + with_slice_prefix and reuse of the files dict for EZKL artifacts looks consistent with the new directory layout, and it’s good that this path is now metadata‑driven instead of hitting the filesystem.

Right now slices_dir is not used in _process_slices_model, so either:

  • Drop the parameter if you don’t plan to use it, or
  • Start threading it through where you eventually need model‑root context (e.g., for future size calculations or extra metadata).

This is purely a cleanliness concern; behavior as written looks correct.


225-293: Execution chain is consistent, but security metric may overstate actual circuit usage

The execution chain construction:

  • Orders slice_* keys numerically.
  • Sets use_circuit only when ezkl is truthy and the circuit and both keys are present.
  • Fills fallback_map from circuit→onnx (or slice→onnx).

This looks coherent and filesystem‑independent.

However, _calculate_security currently counts a slice as “secured” purely based on slice_data.get("ezkl", False):

circuit_slices = sum(1 for slice_data in slices.values() if slice_data.get("ezkl", False))
return round((circuit_slices / total_slices) * 100, 1)

whereas the actual runtime decision (use_circuit) also requires has_circuit and has_keys. If metadata ever has compiled=True but is missing artifacts/keys, overall_security will be higher than the fraction of slices actually using circuits.

If you intend overall_security to reflect effective circuit usage, consider computing it from the same predicate used in _build_execution_chain (or from circuit_slices built from that predicate) instead of raw ezkl.


295-369: _normalize_to_dirs behavior and docs are slightly out of sync

This helper is valuable for dslice/dsperse integration, but a couple of nits:

  • The docstring says it normalizes to (model_root, slices_dir, original_format), but all return sites only return (slices_dir, original_format). If callers only unpack two values today this is just misleading documentation; if any still expect three, they’ll break.
  • The model_root variable in the detected == 'dirs' branch is assigned but unused; the if/else is effectively a no‑op.

A minimal clean‑up would be:

-    def _normalize_to_dirs(slice_path: str):
-        """Normalize input to (model_root, slices_dir, original_format).
+    def _normalize_to_dirs(slice_path: str):
+        """Normalize an input path into a slices directory plus its original packaging format.
@@
-        if detected == 'dirs':
-            sdir = path_obj
-            # If this looks like a slices folder (has metadata.json), parent is model root
-            model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent
-            return sdir.resolve(), 'dirs'
+        if detected == 'dirs':
+            sdir = path_obj
+            return sdir.resolve(), 'dirs'

If _normalize_to_dirs is meant to be the primary entry for dslice/dsperse inputs, it would also be worth double‑checking its callers to ensure they’re using the updated 2‑tuple API consistently.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 320ee4b and b15f27f.

📒 Files selected for processing (2)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
  • save_metadata_file (18-42)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-382)
  • detect_type (93-122)
  • convert (20-90)
🪛 Ruff (0.14.6)
dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


73-73: Unused static method argument: slices_dir

(ARG004)


112-112: Function definition does not bind loop variable slice_key

(B023)


140-140: Unused static method argument: slices_dir

(ARG004)


154-154: Do not catch blind exception: Exception

(BLE001)


320-320: Do not catch blind exception: Exception

(BLE001)


329-330: try-except-pass detected, consider logging the exception

(S110)


329-329: Do not catch blind exception: Exception

(BLE001)


340-340: Do not catch blind exception: Exception

(BLE001)


355-355: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


355-355: Useless if-else condition

(RUF034)


382-383: try-except-pass detected, consider logging the exception

(S110)


382-382: Do not catch blind exception: Exception

(BLE001)


419-419: Create your own exception

(TRY002)


419-419: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
dsperse/src/analyzers/runner_analyzer.py (1)

208-223: build_run_metadata composition looks correct

Assembling slicesexecution_chaincircuit_slicesoverall_security in one place keeps the flow clear, and reusing process_slices for both model‑ and per‑slice cases is a good unification.

No correctness issues stand out here.

requirements.txt Outdated
@@ -1,4 +1,4 @@
torch==2.6.0
torch==2.7.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Step 1: Check torch usage patterns that might be affected by version changes
# torch.load usage (2.7 changed default weights_only parameter)
echo "=== Checking torch.load patterns ==="
rg -n "torch\.load\(" --type=py 2>/dev/null | head -20 || echo "No torch.load found"

echo ""
echo "=== Checking torch version conditionals ==="
# Check for any hardcoded version checks or conditionals related to torch
rg -n "torch.*version" -i --type=py 2>/dev/null | head -10 || echo "No version checks found"

echo ""
echo "=== Counting torch API usage ==="
# Count torch API patterns
rg -n "torch\.(nn\.|functional\.|cuda\.|backends\.)" --type=py 2>/dev/null | wc -l

echo ""
echo "=== Checking for deprecated torch patterns ==="
# Look for patterns that changed significantly between 2.6 and 2.7
rg -n "torch\.(utils\.checkpoint|cuda\.device_count|version\.)" --type=py 2>/dev/null | head -10 || echo "No patterns found"

Repository: inference-labs-inc/dsperse

Length of output: 708


🏁 Script executed:

#!/bin/bash
# Step 2: Explore repository structure to understand the project better
echo "=== Repository structure ==="
ls -la

echo ""
echo "=== Python files in repo ==="
fd -t f -e py | head -20

echo ""
echo "=== Check requirements.txt content ==="
cat requirements.txt

Repository: inference-labs-inc/dsperse

Length of output: 1656


🏁 Script executed:

#!/bin/bash
# Step 3: Check if there are any integration tests or test files that test torch compatibility
echo "=== Looking for test files ==="
fd -t f -e py -path '*test*' 2>/dev/null | head -20

echo ""
echo "=== Check for any CI configuration ==="
fd -t f \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null | head -10

Repository: inference-labs-inc/dsperse

Length of output: 140


🏁 Script executed:

#!/bin/bash
# Step 4: Check GitHub workflows in .github directory
echo "=== Files in .github ==="
find .github -type f 2>/dev/null | head -20

echo ""
echo "=== Check for pytest or test configuration ==="
cat pyproject.toml

echo ""
echo "=== Check for any test commands in install.sh ==="
grep -i "test\|pytest" install.sh || echo "No test commands found"

Repository: inference-labs-inc/dsperse

Length of output: 1049


🏁 Script executed:

#!/bin/bash
# Step 5: Examine torch.load usage more closely - checking for weights_only parameter
echo "=== Examining torch.load calls in detail ==="
echo ""
echo "--- torch_utils.py ---"
sed -n '30,40p' dsperse/src/utils/torch_utils.py

echo ""
echo "--- pth_models.py (lines 40-50) ---"
sed -n '40,50p' dsperse/src/backends/pth_models.py

echo ""
echo "--- pth_models.py (lines 95-105) ---"
sed -n '95,105p' dsperse/src/backends/pth_models.py

echo ""
echo "--- model_analyzer.py (lines 50-60) ---"
sed -n '50,60p' dsperse/src/analyzers/model_analyzer.py

Repository: inference-labs-inc/dsperse

Length of output: 2316


🏁 Script executed:

#!/bin/bash
# Step 6: Check if torch.load calls might have issues with 2.7.1
# In PyTorch 2.7.1, weights_only became True by default (breaking change in 2.13+, but affects loading)
echo "=== Check torch version in actual code or documentation ==="
rg -n "2\.6\|2\.7\|version" README.md --type=md | head -20

echo ""
echo "=== Check if there are requirements-dev or requirements-test files ==="
fd -t f -e txt | grep -i req

echo ""
echo "=== Full torch_utils.py to understand context ==="
wc -l dsperse/src/utils/torch_utils.py

Repository: inference-labs-inc/dsperse

Length of output: 306


🏁 Script executed:

#!/bin/bash
# Step 7: Check git diff to see what changed in pyproject.toml
echo "=== Git diff for pyproject.toml ==="
git diff HEAD~1 pyproject.toml 2>/dev/null || git show HEAD:pyproject.toml 2>/dev/null

echo ""
echo "=== Git diff for requirements.txt ==="
git diff HEAD~1 requirements.txt 2>/dev/null || git show HEAD:requirements.txt 2>/dev/null

echo ""
echo "=== Check git log for recent changes ==="
git log --oneline -10 2>/dev/null

Repository: inference-labs-inc/dsperse

Length of output: 1225


🏁 Script executed:

#!/bin/bash
# Step 8: Check if weights_only parameter is mentioned anywhere in the codebase
echo "=== Searching for weights_only ==="
rg -n "weights_only" --type=py

echo ""
echo "=== Check torch release notes or documentation references ==="
rg -n "torch" dsperse/README.md 2>/dev/null || rg -n "torch" README.md 2>/dev/null | head -20

Repository: inference-labs-inc/dsperse

Length of output: 167


Critical dependency version mismatch: pyproject.toml still specifies torch==2.6.0 while requirements.txt has been updated to torch==2.7.1.

The PR updated only requirements.txt but did not update pyproject.toml. When the package is installed via pip install ., the pyproject.toml version takes precedence, meaning users will receive torch==2.6.0 instead of the intended 2.7.1. Both files must be updated consistently:

# pyproject.toml line ~19
dependencies = [
    "torch==2.7.1",  # Update from 2.6.0 to match requirements.txt

Additionally, review the four torch.load() calls in dsperse/src/utils/torch_utils.py (line 33), dsperse/src/backends/pth_models.py (lines 46, 99), and dsperse/src/analyzers/model_analyzer.py (line 56). Consider explicitly setting weights_only=False for backward compatibility, as PyTorch's default behavior may change in future versions.

🤖 Prompt for AI Agents
In requirements.txt around lines 1-1 and pyproject.toml around line ~19, the
torch version was updated to 2.7.1 in requirements.txt but pyproject.toml still
pins torch==2.6.0; update pyproject.toml's dependencies entry to "torch==2.7.1"
to keep installer behavior consistent. Also, in the codebase update the four
torch.load() calls for explicit backward-compatibility:
dsperse/src/utils/torch_utils.py line 33, dsperse/src/backends/pth_models.py
lines 46 and 99, and dsperse/src/analyzers/model_analyzer.py line 56 — modify
each call to include the argument weights_only=False to ensure consistent
semantics across PyTorch versions.

* Update torch and python version, plus some docks for deps management

* Update py.test version and python==3.12

* 'Runner' class - pass slices_path to 'run', not 'init'
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dsperse/src/cli/run.py (1)

109-117: Retry path should be normalized before existence check.

When the user provides an alternative file path via retry_option, the path is checked for existence without normalization. This could cause issues with paths containing ~ or environment variables.

         elif retry_option:
+            retry_option = normalize_path(retry_option)
             if os.path.exists(retry_option):
                 args.input_file = retry_option
♻️ Duplicate comments (1)
requirements.txt (1)

1-3: LGTM - Generated lockfile with hash verification.

The switch to a uv-generated lockfile with hash pins and environment markers improves reproducibility and supply-chain security. The editable install (-e .) properly references the project.

Note: The torch version mismatch with pyproject.toml was already flagged in a previous review and should be addressed there.

🧹 Nitpick comments (5)
dsperse/src/cli/run.py (1)

68-81: Metadata detection may silently default to no slices path.

When neither metadata.json exists directly in the target directory nor in a slices subdirectory, slices_dir_effective is set to None (line 80), relying on Runner to auto-detect. Consider logging a debug message to indicate this fallback path is being taken, which would aid troubleshooting.

         else:
             # No immediate metadata found; Runner will attempt to detect dsperse inside the directory
+            logger.debug(f"No metadata.json found in {target_path} or {target_path}/slices; Runner will attempt auto-detection")
             slices_dir_effective = None
             model_dir = target_path
dsperse/src/run/runner.py (4)

24-34: Use explicit Optional type hints per PEP 484.

The constructor parameters have implicit Optional types (str = None). PEP 484 recommends explicit Optional[str] or str | None for clarity.

+from typing import Optional
+
 class Runner:
-    def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None):
+    def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None):
         """Initialize the Runner.

Apply similar changes to other method signatures at lines 36, 64, and 99.


41-44: Use a specific exception type for validation errors.

Raising a bare Exception makes error handling harder for callers. Consider defining a custom exception or using ValueError.

         # Ensure slices path is available and valid
         if slice_path is None or not Path(slice_path).exists():
-            raise Exception("A valid path must be provided for slices")
+            raise ValueError(f"A valid path must be provided for slices: {slice_path}")
         self.slices_path = Path(slice_path)

71-82: Extract duplicated path resolution logic.

The _resolve_rel_path helper is defined twice with slightly different implementations (lines 71-82 in run_onnx_slice and lines 104-108 in _run_ezkl_slice). Consider extracting to a shared utility method for consistency and maintainability.

# Add as a static method on Runner or in RunnerUtils
@staticmethod
def _resolve_rel_path(path: str, base_dir: Path) -> str | None:
    """Resolve a possibly relative path against base_dir."""
    if not path:
        return None
    if os.path.isabs(path):
        return path
    resolved = (base_dir / path).resolve()
    if not resolved.exists():
        # Try parent as fallback
        resolved = (base_dir.parent / path).resolve()
    return str(resolved)

Also applies to: 104-108


265-299: Demo block should be guarded or removed for production.

The if __name__ == "__main__" block with hardcoded model paths is useful for development but may cause confusion if inadvertently executed. Consider moving to a separate test/demo script or adding a comment clarifying its purpose.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b15f27f and 09dec90.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • docs/uv_packaging.md (1 hunks)
  • dsperse/src/cli/run.py (4 hunks)
  • dsperse/src/run/runner.py (1 hunks)
  • pyproject.toml (1 hunks)
  • requirements.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/uv_packaging.md
🧰 Additional context used
🧬 Code graph analysis (2)
dsperse/src/run/runner.py (4)
dsperse/src/analyzers/runner_analyzer.py (2)
  • RunnerAnalyzer (16-466)
  • generate_run_metadata (433-466)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • generate_witness (160-242)
dsperse/src/backends/onnx_models.py (2)
  • OnnxModels (13-222)
  • run_inference (18-60)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-330)
dsperse/src/cli/run.py (4)
dsperse/src/run/runner.py (2)
  • run (36-61)
  • Runner (23-262)
dsperse/src/backends/pth_models.py (1)
  • run_inference (37-75)
dsperse/src/backends/onnx_models.py (1)
  • run_inference (18-60)
dsperse/src/cli/base.py (3)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
  • check_model_dir (157-174)
🪛 Ruff (0.14.6)
dsperse/src/run/runner.py

24-24: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


24-24: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


36-36: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


43-43: Create your own exception

(TRY002)


43-43: Avoid specifying long messages outside the exception class

(TRY003)


64-64: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


99-99: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


132-132: Do not catch blind exception: Exception

(BLE001)


179-179: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


246-246: Create your own exception

(TRY002)


246-246: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
pyproject.toml (2)

14-16: Torch version now consistent with requirements.txt.

The torch version (2.7.1) now matches requirements.txt, resolving the previously flagged mismatch. The mix of exact pins (==) and flexible constraints (>=, ~=) is acceptable for distinguishing runtime dependencies from dev tooling.


26-27: CLI entry point is properly configured.

The module dsperse/main.py exists and the main() function is defined at line 40. The entry point definition is correct.

dsperse/src/cli/run.py (3)

30-36: Argument consolidation with backward-compatible aliases.

The refactoring to use --path as the primary argument while retaining --slices-dir and other aliases maintains backward compatibility. The updated help text correctly reflects the new flexible input semantics (directory or .dsperse/.dslice file).


53-61: Path resolution handles multiple input formats.

The logic correctly resolves the target path from either the new path argument or the legacy slices_dir, with appropriate prompting and normalization. The check_model_dir validation ensures the path exists before proceeding.


133-135: Runner instantiation and invocation align with new API.

The Runner is correctly instantiated with only run_metadata_path and the run method receives both input_file and slice_path. The fallback to model_dir when slices_dir_effective is None ensures a valid path is always passed.

dsperse/src/run/runner.py (2)

122-134: Exception handling for EZKL fallback is appropriate.

The broad except Exception catch (line 132) is acceptable here since it enables graceful fallback behavior. The error is captured in exec_info for debugging. The static analysis hint can be safely ignored in this context.


229-248: Core execution loop is well-structured.

The _run method cleanly iterates through the execution chain, managing IO between slices and accumulating results. Error handling at line 246 appropriately terminates on slice failure with context.

However, line 246 should use a more specific exception type for consistency with the earlier suggestion.

             if not ok:
-                raise Exception(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")
+                raise RuntimeError(f"Inference failed for {current_slice_id}: {exec_info.get('error', 'unknown')}")

description = "Distributed zkML Toolkit"
readme = "README.md"
requires-python = ">=3.8"
requires-python = "==3.12.*"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for Python 3.12+ specific syntax/features
echo "=== Checking for Python 3.12+ specific features ==="

# Type parameters (PEP 695 - 3.12+)
echo "1. Type parameters syntax (type X = ...):"
rg "^\s*type\s+\w+\s*=" --type=py 2>/dev/null | head -5

# override decorator (3.12+)
echo "2. Override decorator:"
rg "from typing import.*override" --type=py 2>/dev/null

# PEP 688 - Buffer protocol (3.12+)
echo "3. TypedDict NotRequired/Required:"
rg "from typing.*\(NotRequired|Required\)" --type=py 2>/dev/null

# Check Python version in imports or config
echo "4. Explicit Python version mentions:"
rg "sys.version_info" --type=py 2>/dev/null | head -5

# Check if any 3.11+ features are present
echo "5. Match statement (3.10+):"
rg "^\s*match\s+" --type=py 2>/dev/null | head -3

# List main Python files to review
echo "6. Python source files in repo:"
fd --type=py --max-depth=2 'src/|dsperse/' 2>/dev/null | head -20

Repository: inference-labs-inc/dsperse

Length of output: 415


🏁 Script executed:

#!/bin/bash
# Get repository structure
echo "=== Repository structure ==="
ls -la | head -20

echo -e "\n=== Checking pyproject.toml ==="
cat pyproject.toml

echo -e "\n=== Checking setup.py or setup.cfg if present ==="
ls -la setup.py setup.cfg 2>/dev/null

echo -e "\n=== Full directory tree (limited depth) ==="
find . -maxdepth 3 -type f -name "*.py" | head -30

Repository: inference-labs-inc/dsperse

Length of output: 2163


🏁 Script executed:

#!/bin/bash
# Check torch 2.7.1 Python support
echo "=== Torch 2.7.1 Python version requirements ==="
rg "torch" requirements.txt | head -5

echo -e "\n=== Checking main.py for version-specific code ==="
cat dsperse/main.py

echo -e "\n=== Checking constants.py ==="
head -50 dsperse/src/constants.py

echo -e "\n=== Searching for any version guards or conditionals ==="
rg "version_info|sys.version|python_requires|3\.12|3\.11|3\.10" --type=py 2>/dev/null

Repository: inference-labs-inc/dsperse

Length of output: 4857


Overly restrictive Python version constraint.

The strict pin to ==3.12.* breaks compatibility with Python 3.10, 3.11, and future versions (3.13+) without justification. The codebase contains no 3.12-specific syntax or features. If Python 3.12 is truly required for production support, use >=3.12 instead to allow forward compatibility; otherwise, consider relaxing to >=3.10 or >=3.11 based on your actual minimum supported version.

🤖 Prompt for AI Agents
In pyproject.toml around line 10, the requires-python entry is overly
restrictive ("==3.12.*"); update it to a permissive constraint that matches your
supported minimums (for example ">=3.10" or ">=3.11") or, if you truly require
3.12, change it to ">=3.12" to allow future patch/minor versions; pick the
correct lower bound for your supported Python versions and replace the
exact-equals pin with the chosen >= range.

@shirin-shahabi
Copy link
Member

Review: DSperse/DSlice File Type Integration

Tested full pipeline with doom and net models through: slice -> convert (dirs/dslice/dsperse) -> compile -> run

Works

  • Slicing to dirs (with absolute paths)
  • All conversion directions (dirs <-> dslice <-> dsperse)
  • Compile with direct .dsperse file path
  • Run with direct .dsperse file path

Issues Found

  1. CRITICAL: onnx_utils.py:167 - Slicing fails with relative paths due to
    absolute vs relative path mismatch in relative_to() call.
    Fix: Use root = Path(slices_root).resolve() at line 73.

  2. CRITICAL: converter.py:107-122 - detect_type() doesn't scan directories
    for .dsperse files, causing run/prove/verify to fail when given a model
    directory containing a .dsperse file. CompilerUtils.is_sliced_model()
    handles this case correctly but the logic isn't shared.

  3. MEDIUM - Inconsistent detection logic between CompilerUtils, Converter,
    and Runner. Consider centralizing in Converter.detect_type().

  4. LOW - Python version constraint too restrictive (==3.12.*).

Copy link
Member

@shirin-shahabi shirin-shahabi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: DSperse/DSlice File Type Integration

Tested full pipeline with doom and net models through: slice -> convert (dirs/dslice/dsperse) -> compile -> run

Works

  • Slicing to dirs (with absolute paths)
  • All conversion directions (dirs <-> dslice <-> dsperse)
  • Compile with direct .dsperse file path
  • Run with direct .dsperse file path

Issues Found

  1. CRITICAL: onnx_utils.py:167 - Slicing fails with relative paths due to
    absolute vs relative path mismatch in relative_to() call.
    Fix: Use root = Path(slices_root).resolve() at line 73.

  2. CRITICAL: converter.py:107-122 - detect_type() doesn't scan directories
    for .dsperse files, causing run/prove/verify to fail when given a model
    directory containing a .dsperse file. CompilerUtils.is_sliced_model()
    handles this case correctly but the logic isn't shared.

  3. MEDIUM - Inconsistent detection logic between CompilerUtils, Converter,
    and Runner. Consider centralizing in Converter.detect_type().

  4. LOW - Python version constraint too restrictive (==3.12.*).

Recommendation

Address issues #1 and #2 before merge as they break common use cases.

…e and multi-slice modes, streamline logic, enhance metadata handling, and remove deprecated options.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
dsperse/src/verifier.py (1)

251-251: Remove extraneous f-string prefix.

This was flagged in a past review and is also caught by static analysis (F541). The string has no placeholders.

-    print(f"\nVerification completed!")
+    print("\nVerification completed!")
dsperse/src/slice/utils/converter.py (1)

51-91: Missing explicit return/raise for unhandled type combinations.

The convert() function can implicitly return None if an unexpected type combination is encountered. A past review suggested adding a fallback. This should be addressed to prevent silent failures.

             elif output_type == "dslice":
                 # Convert dsperse -> dirs (without expanding slices)
                 temp_dirs = Converter._dsperse_to_dirs(path_obj, None, expand_slices=False)
                 # The dsperse already contains dslices, so just return the directory
                 logger.info(f"Extracted dsperse to directory with dslice files: {temp_dirs}")
                 if cleanup:
                     try:
                         path_obj.unlink()
                         logger.info(f"Removed source dsperse file: {path_obj}")
                     except Exception as e:
                         logger.warning(f"Could not remove source dsperse file {path_obj}: {e}")
                 return temp_dirs
+
+        # Fallback for unhandled combinations
+        raise ValueError(f"Unsupported conversion: {current_type} -> {output_type}")
dsperse/src/analyzers/runner_analyzer.py (2)

150-215: Guard _process_slices_per_slice against missing paths (avoid os.path.join(..., None) and fragile parent_dir)

This block can still raise exceptions on partially-populated metadata:

  • parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0] will break if entry["path"] is missing or absolute, and it’s a brittle way to derive the slice directory name.
  • os.path.join(parent_dir, files.get(...)) will raise TypeError when any of the EZKL file keys are missing or None, which is plausible for pre-/failed compilation states.
  • onnx_path = os.path.join(parent_dir, slice.get("relative_path")) assumes relative_path is always present and already normalized; it can double-prefix the slice dir or crash on None.

To make this resilient (and match the robustness of _process_slices_model), please normalize ONNX and EZKL paths conditionally, only when the underlying metadata fields are present. For example:

-        for entry in slices_data_list:
-            meta_path = entry.get("slice_metadata")
-            parent_dir = os.path.dirname(entry.get("path")).split(os.sep)[0]
-            
+        for entry in slices_data_list:
+            meta_path = entry.get("slice_metadata")
+            # Derive the slice directory name defensively from the entry path.
+            entry_path = entry.get("path") or ""
+            parent_dir = os.path.normpath(entry_path).split(os.sep)[0] if entry_path else ""
+            if not meta_path:
+                logger.warning("Missing slice_metadata path for entry %s; skipping", entry)
+                continue
+
             try:
                 with open(meta_path, "r") as f:
                     meta = json.load(f)
             except Exception as e:
                 logger.warning(f"Failed to load slice metadata at {meta_path}: {e}")
                 continue
@@
-            # Normalize ONNX path
-            onnx_path = os.path.join(parent_dir, slice.get("relative_path"))
+            # Normalize ONNX path
+            rel_from_meta = slice.get("relative_path") or slice.get("path")
+            onnx_path = os.path.join(parent_dir, rel_from_meta) if (parent_dir and rel_from_meta) else rel_from_meta
@@
-            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
-            files = (comp.get("files") or {})
-            compiled_flag = bool(comp.get("compiled", False))
-
-            if files:
-                circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled"))
-                settings_path = os.path.join(parent_dir, files.get("settings"))
-                pk_path = os.path.join(parent_dir, files.get("pk_key"))
-                vk_path = os.path.join(parent_dir, files.get("vk_key"))
-            else:
-                circuit_path = None
-                settings_path = None
-                pk_path = None
-                vk_path = None
+            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
+            files = (comp.get("files") or {})
+            compiled_flag = bool(comp.get("compiled", False))
+
+            compiled_rel = files.get("compiled_circuit") or files.get("compiled")
+
+            def _join_optional(key: str) -> Optional[str]:
+                rel = files.get(key)
+                return os.path.join(parent_dir, rel) if (parent_dir and rel) else None
+
+            circuit_path = os.path.join(parent_dir, compiled_rel) if (parent_dir and compiled_rel) else None
+            settings_path = _join_optional("settings")
+            pk_path = _join_optional("pk_key")
+            vk_path = _join_optional("vk_key")

This keeps the analyzer tolerant of missing EZKL artifacts and incomplete slice metadata instead of crashing.


382-452: Normalize slice_path to Path, use _normalize_to_dirs, and fix string-path crash & model‑root handling in generate_run_metadata / __main__

There are two coupled correctness issues here:

  1. String path crash: __main__ currently calls:
out = RunnerAnalyzer.generate_run_metadata(str(model_path))

generate_run_metadata then passes this string directly into _has_model_metadata, which does (path / "metadata.json"). With a str, this raises TypeError: unsupported operand type(s) for /: 'str' and 'str'.

  1. Model-root vs slices-dir mismatch: _has_model_metadata returns True when either <path>/metadata.json or <path>/slices/metadata.json exists. For a model root (.../doom) with slices/metadata.json, this is True, but _build_from_model_metadata is then called with slices_dir=path and load_slices_metadata tries to open <model_root>/metadata.json, which fails even though <model_root>/slices/metadata.json exists. The unused _normalize_to_dirs helper is clearly intended to resolve this by normalizing to the slices directory first.

To fix both issues and actually support .dslice / .dsperse inputs as per the docstring, I recommend:

  • Accepting str | Path for slice_path.
  • Normalizing to a Path immediately.
  • Using _normalize_to_dirs to compute the effective slices_dir and detected packaging format.
  • Using the normalized slices_dir for _has_model_metadata / _build_from_*.
  • Recording source_path as the resolved original input.
  • Updating __main__ to pass a Path (though with the normalization in place, strings are fine too).

Example patch:

-    @staticmethod
-    def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
+    @staticmethod
+    def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None):
@@
-        if RunnerAnalyzer._has_model_metadata(slice_path):
-            run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
-        else:
-            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
+        # Normalize the input into a slices directory and detected format
+        path_obj = Path(slice_path)
+        slices_dir, detected_format = RunnerAnalyzer._normalize_to_dirs(path_obj)
+
+        # If caller didn't specify original_format, use what we detected
+        if original_format is None:
+            original_format = detected_format
+
+        # Build run metadata from either model-level or per-slice metadata
+        if RunnerAnalyzer._has_model_metadata(slices_dir):
+            run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir)
+        else:
+            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir)
@@
-        # Attach packaging metadata
-        run_meta["packaging_type"] = original_format
-        run_meta["source_path"] = str(slice_path)
+        # Attach packaging metadata
+        run_meta["packaging_type"] = original_format
+        run_meta["source_path"] = str(path_obj.resolve())
@@
-        # Save
-        if save_path is None:
-            base = Path(slice_path).resolve()
-            # If slice_path is a model root or slices dir, put run/metadata.json next to it
-            base_dir = base if base.is_dir() else base.parent
-            save_path = base_dir / "run" / "metadata.json"
-        else:
-            save_path = Path(save_path).resolve()
+        # Save
+        if save_path is None:
+            base = path_obj.resolve()
+            # If slice_path is a model root, slices dir, or file (dslice/dsperse), put run/metadata.json next to it
+            base_dir = base if base.is_dir() else base.parent
+            save_path = base_dir / "run" / "metadata.json"
+        else:
+            save_path = Path(save_path).resolve()
@@
-    model_dir = base_paths[model_choice]
+    model_dir = base_paths[model_choice]
@@
-    out = RunnerAnalyzer.generate_run_metadata(str(model_path))
+    out = RunnerAnalyzer.generate_run_metadata(model_path)

This:

  • Eliminates the string-path crash.
  • Ensures model roots with slices/metadata.json are normalized to the actual slices directory before reading metadata.
  • Makes generate_run_metadata consistent with the PR’s dslice/dsperse integration story by actually using _normalize_to_dirs.

Also applies to: 454-487, 489-500

🧹 Nitpick comments (16)
dsperse/src/cli/prove.py (2)

14-14: Unused import: Utils

Utils is imported but never used in this file.

-from dsperse.src.utils.utils import Utils

134-135: Remove empty finally block.

The finally: pass block serves no purpose and can be removed.

     except Exception as e:
         print(f"{Fore.RED}Error proving run: {e}{Style.RESET_ALL}")
         traceback.print_exc()
-    finally:
-        pass
dsperse/src/cli/verify.py (3)

14-14: Unused import: Utils

Utils is imported but never used in this file.

-from dsperse.src.utils.utils import Utils

136-137: Remove empty finally block.

Same issue as in prove.py — the finally: pass block is unnecessary.

     except Exception as e:
         print(f"{Fore.RED}Error verifying run: {e}{Style.RESET_ALL}")
         traceback.print_exc()
-    finally:
-        pass

36-58: Duplicated get_all_runs function.

This function is identical to the one in prove.py. As noted in a past comment, there's significant code sharing between these CLI modules. Consider extracting shared utilities to a common module to reduce duplication.

dsperse/src/verifier.py (2)

42-49: Silent exception swallowing can hide bugs.

The try-except-pass block silently ignores any errors when building the proof_paths_by_slice map. Consider logging the exception to aid debugging.

         try:
             for entry in run_results.get("execution_chain", {}).get("execution_results", []):
                 sid = entry.get("slice_id") or entry.get("segment_id")
                 pe = entry.get("proof_execution", {}) if isinstance(entry, dict) else {}
                 if sid and pe and pe.get("proof_file"):
                     proof_paths_by_slice[sid] = pe.get("proof_file")
         except Exception:
-            pass
+            logger.debug("Failed to extract proof paths from run_results", exc_info=True)

156-160: Remove redundant import.

Utils is already imported at the module level (line 12). This local import is unnecessary.

             # Convert back to the original packaging if needed
             if detected != "dirs":
-                from dsperse.src.utils.utils import Utils as _Utils
-                root = _Utils.dirs_root_from(Path(dirs_model_path))
+                root = Utils.dirs_root_from(Path(dirs_model_path))
                 Converter.convert(str(root), output_type=detected, cleanup=True)
dsperse/src/cli/slice.py (1)

78-89: Minor: Remove redundant import.

Path is already imported at the module level (line 7). The local import as _P is unnecessary.

     if not getattr(args, 'to_type', None):
         valid = {'dirs', 'dslice', 'dsperse'}
-        from pathlib import Path as _P
         while True:
             chosen_raw = prompt_for_value('to', 'Enter desired output type (dirs, dslice, dsperse)')
             # Sanitize in case prompt_for_value normalized it like a path
-            chosen = _P(str(chosen_raw)).name.strip().lower()
+            chosen = Path(str(chosen_raw)).name.strip().lower()
             if chosen in valid:
                 args.to_type = chosen
                 break
             print(f"{Fore.YELLOW}Invalid choice: '{chosen}'. Please enter one of: dirs, dslice, dsperse{Style.RESET_ALL}")
dsperse/src/run/runner.py (2)

73-84: Duplicate _resolve_rel_path functions with divergent logic.

There are two nested _resolve_rel_path functions:

  • Lines 73-84 in run_onnx_slice: strips matching basename prefix and resolves
  • Lines 106-110 in _run_ezkl_slice: tries base_dir first, then parent

These have different behavior and could cause inconsistent path resolution. Consider extracting a single utility function with unified logic.

Also applies to: 106-110


24-24: Use explicit | None for optional parameters.

Static analysis flags implicit Optional types. For clarity and PEP 484 compliance:

-    def __init__(self, run_metadata_path: str = None, save_metadata_path: str = None):
+    def __init__(self, run_metadata_path: str | None = None, save_metadata_path: str | None = None):
-    def run(self, input_json_path, slice_path: str, output_path: str = None) -> dict:
+    def run(self, input_json_path, slice_path: str, output_path: str | None = None) -> dict:

Also applies to: 38-38

dsperse/src/slice/utils/converter.py (1)

349-349: Rename unused loop variable.

The dirs variable from os.walk is not used. Prefix with underscore to indicate intentional non-use.

-            for root, dirs, files in os.walk(source_dir):
+            for root, _dirs, files in os.walk(source_dir):
dsperse/src/prover.py (2)

39-40: Redundant EZKL instantiation.

prove_dirs creates a new EZKL() instance, but the class already has self.ezkl_runner initialized in __init__. Use the existing instance for consistency.

-        ezkl = EZKL()
         proofs: dict[str, dict] = {}

And update the prove call at line 75:

-            success, result = ezkl.prove(
+            success, result = self.ezkl_runner.prove(

290-290: Remove extraneous f-string prefix.

Static analysis (F541) correctly flags this string has no placeholders.

-    print(f"\nProving completed!")
+    print("\nProving completed!")
dsperse/src/analyzers/runner_analyzer.py (3)

42-50: Prefer logging.exception and narrower exception types in load_slices_metadata

Catching a blanket Exception and logging with logger.error makes debugging harder and can hide the original traceback.

Consider switching to logger.exception (which preserves the stack) and, if feasible, narrowing the except to OSError | json.JSONDecodeError:

-        try:
-            with open(Path(slices_dir) / 'metadata.json', 'r') as f:
-                return json.load(f)
-        except Exception as e:
-            logger.error(f"Failed to load slices metadata from {slices_dir}: {e}")
-            raise
+        try:
+            with open(Path(slices_dir) / "metadata.json", "r") as f:
+                return json.load(f)
+        except Exception:
+            logger.exception("Failed to load slices metadata from %s", slices_dir)
+            raise

53-81: process_slices dispatch looks robust; consider trimming unused parameters downstream

The new dispatcher correctly:

  • Normalizes slices_data to a list.
  • Uses explicit keys (slice_metadata / slice_metadata_path) instead of structural length checks.
  • Routes cleanly between _process_slices_per_slice and _process_slices_model.

However, both _process_slices_model(slices_dir, ...) and _process_slices_per_slice(slices_dir, ...) no longer use slices_dir. If this argument truly isn’t needed, you can drop it from their signatures and the calls here to quiet static-analysis noise and simplify the API.


304-379: _normalize_to_dirs is useful but slightly misleading as written (docstring + unused local + wiring)

This helper looks like the right central place for dslice/dsperse detection and normalization, but there are a few issues:

  • The docstring advertises (model_root, slices_dir, original_format), while the function actually returns (slices_dir, original_format).
  • The model_root variable at Line 365 is assigned but never used, and the if ... else there is redundant.
  • Within this file, _normalize_to_dirs is not (yet) used by generate_run_metadata, so detection logic is still split between this helper and _has_model_metadata / _build_from_*.

To tighten this up:

  1. Align the signature and docstring with actual behavior and accept Path inputs:
-    @staticmethod
-    def _normalize_to_dirs(slice_path: str):
-        """Normalize input to (model_root, slices_dir, original_format).
+    @staticmethod
+    def _normalize_to_dirs(slice_path: str | Path):
+        """Normalize input to (slices_dir, original_format).
@@
-        path_obj = Path(slice_path)
+        path_obj = Path(slice_path)
  1. Remove the unused model_root and redundant conditional:
-        if detected == 'dirs':
-            sdir = path_obj
-            # If this looks like a slices folder (has metadata.json), parent is model root
-            model_root = sdir.parent if (sdir / 'metadata.json').exists() else sdir.parent
-            return sdir.resolve(), 'dirs'
+        if detected == "dirs":
+            sdir = path_obj
+            return sdir.resolve(), "dirs"
  1. Wire _normalize_to_dirs into generate_run_metadata (see separate comment) so this normalization is actually used for .dslice / .dsperse and “model root vs slices dir” inputs, aligning with the PR goal of consistent detection logic.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09dec90 and 9cb0698.

📒 Files selected for processing (9)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
  • dsperse/src/cli/prove.py (5 hunks)
  • dsperse/src/cli/run.py (4 hunks)
  • dsperse/src/cli/slice.py (4 hunks)
  • dsperse/src/cli/verify.py (4 hunks)
  • dsperse/src/prover.py (3 hunks)
  • dsperse/src/run/runner.py (1 hunks)
  • dsperse/src/slice/utils/converter.py (1 hunks)
  • dsperse/src/verifier.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
dsperse/src/verifier.py (3)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-385)
  • convert (20-90)
  • detect_type (93-122)
dsperse/src/analyzers/runner_analyzer.py (2)
  • RunnerAnalyzer (16-487)
  • generate_run_metadata (454-487)
dsperse/src/utils/utils.py (6)
  • Utils (12-359)
  • load_run_metadata (174-180)
  • load_run_results (241-247)
  • merge_execution_into_run_results (257-313)
  • save_run_results (250-254)
  • update_metadata_after_execution (316-359)
dsperse/src/cli/run.py (2)
dsperse/src/run/runner.py (1)
  • run (38-63)
dsperse/src/cli/base.py (3)
  • prompt_for_value (242-302)
  • normalize_path (21-32)
  • check_model_dir (157-174)
dsperse/src/run/runner.py (3)
dsperse/src/backends/ezkl.py (2)
  • EZKL (107-693)
  • generate_witness (160-242)
dsperse/src/backends/onnx_models.py (2)
  • OnnxModels (13-222)
  • run_inference (18-60)
dsperse/src/run/utils/runner_utils.py (1)
  • RunnerUtils (16-330)
dsperse/src/slice/utils/converter.py (2)
dsperse/src/cli/convert.py (1)
  • convert (44-98)
install.sh (1)
  • info (57-57)
🪛 Ruff (0.14.7)
dsperse/src/verifier.py

48-49: try-except-pass detected, consider logging the exception

(S110)


48-48: Do not catch blind exception: Exception

(BLE001)


146-146: Avoid specifying long messages outside the exception class

(TRY003)


164-164: Avoid specifying long messages outside the exception class

(TRY003)


177-177: Avoid specifying long messages outside the exception class

(TRY003)


187-187: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


193-193: Avoid specifying long messages outside the exception class

(TRY003)


251-251: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/cli/slice.py

102-102: Do not catch blind exception: Exception

(BLE001)

dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


83-83: Unused static method argument: slices_dir

(ARG004)


122-122: Function definition does not bind loop variable slice_key

(B023)


150-150: Unused static method argument: slices_dir

(ARG004)


164-164: Do not catch blind exception: Exception

(BLE001)


330-330: Do not catch blind exception: Exception

(BLE001)


339-340: try-except-pass detected, consider logging the exception

(S110)


339-339: Do not catch blind exception: Exception

(BLE001)


350-350: Do not catch blind exception: Exception

(BLE001)


365-365: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


365-365: Useless if-else condition

(RUF034)


392-393: try-except-pass detected, consider logging the exception

(S110)


392-392: Do not catch blind exception: Exception

(BLE001)


424-424: Do not catch blind exception: Exception

(BLE001)


440-440: Create your own exception

(TRY002)


440-440: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/cli/verify.py

133-133: Do not catch blind exception: Exception

(BLE001)

dsperse/src/run/runner.py

24-24: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


24-24: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


38-38: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


45-45: Create your own exception

(TRY002)


45-45: Avoid specifying long messages outside the exception class

(TRY003)


66-66: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


101-101: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


134-134: Do not catch blind exception: Exception

(BLE001)


181-181: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


248-248: Create your own exception

(TRY002)


248-248: Avoid specifying long messages outside the exception class

(TRY003)

dsperse/src/prover.py

155-155: Avoid specifying long messages outside the exception class

(TRY003)


174-174: Avoid specifying long messages outside the exception class

(TRY003)


191-191: Avoid specifying long messages outside the exception class

(TRY003)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


207-207: Avoid specifying long messages outside the exception class

(TRY003)


214-214: Avoid specifying long messages outside the exception class

(TRY003)


290-290: f-string without any placeholders

Remove extraneous f prefix

(F541)

dsperse/src/slice/utils/converter.py

20-20: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


35-35: Avoid specifying long messages outside the exception class

(TRY003)


37-37: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Do not catch blind exception: Exception

(BLE001)


88-88: Do not catch blind exception: Exception

(BLE001)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


122-122: Avoid specifying long messages outside the exception class

(TRY003)


135-135: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


165-165: Do not catch blind exception: Exception

(BLE001)


170-170: Avoid specifying long messages outside the exception class

(TRY003)


195-195: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


211-211: Avoid specifying long messages outside the exception class

(TRY003)


267-267: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


290-290: Avoid specifying long messages outside the exception class

(TRY003)


309-309: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


344-344: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


349-349: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)


440-440: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (11)
dsperse/src/cli/prove.py (1)

35-57: LGTM: Path normalization in get_all_runs.

The function correctly normalizes the run root directory and ensures all returned paths are absolute, which addresses the relative vs absolute path issues mentioned in the PR comments.

dsperse/src/verifier.py (1)

73-78: LGTM: Proper validation before EZKL verification.

The settings_path and vk_path are now properly validated before calling ezkl_runner.verify(), addressing the critical bug identified in past reviews where None would be converted to the string "None" and cause FileNotFoundError.

dsperse/src/cli/run.py (2)

62-81: Path resolution logic is complex but handles multiple input scenarios correctly.

The conditional logic correctly distinguishes between:

  • File inputs (dsperse/dslice)
  • Directories with metadata.json at root
  • Directories with metadata.json in a slices subdirectory
  • Directories where Runner will auto-detect

This flexibility is necessary for the various input modes, but consider adding a brief inline comment summarizing the priority order for future maintainability.


133-135: LGTM: Runner instantiation and invocation.

The Runner is correctly instantiated with optional run_metadata_path and invoked with input_file and slice_path. The fallback slices_dir_effective or model_dir ensures a valid path is always passed.

dsperse/src/cli/slice.py (1)

59-105: LGTM: slice_convert sub-command implementation.

The function properly handles interactive prompting, input validation, and delegates to Converter.convert(). The error handling is appropriate for a CLI context.

dsperse/src/run/runner.py (1)

231-264: LGTM: Execution chain traversal and result aggregation.

The _run method correctly:

  • Traverses the execution chain via the linked-list pattern
  • Prepares per-slice I/O files
  • Accumulates slice results
  • Computes final probabilities and prediction
  • Persists results to run_results.json
dsperse/src/slice/utils/converter.py (1)

92-122: LGTM: detect_type implementation.

The function correctly identifies format types by:

  • Checking file suffixes for .dsperse and .dslice
  • Detecting slice_* directories for dirs format
  • Checking for single slice directories
  • Checking for .dslice files in a directory

The error handling for unknown types is appropriate.

dsperse/src/prover.py (2)

56-64: LGTM: Proper artifact validation before proving.

The code correctly validates required files (circuit, pk) and gracefully handles missing optional files (settings). This prevents cryptic errors from the EZKL backend.


132-174: LGTM: Unified prove entry point.

The method correctly:

  • Detects run mode (run-root vs single-slice)
  • Detects packaging format via Converter.detect_type
  • Routes to the appropriate proving method
  • Handles format conversion for single-slice mode
  • Provides clear error messages for invalid paths
dsperse/src/analyzers/runner_analyzer.py (2)

83-147: Model-level slice processing (_process_slices_model) is consistent and metadata‑driven

This implementation cleanly:

  • Normalizes ONNX and EZKL paths via rel_from_payload + with_slice_prefix.
  • Avoids filesystem checks and trusts metadata, which matches the analyzer’s role.
  • Handles missing shapes/dependencies/compilation info gracefully via .get(..., default) and Optional normalization.

No blocking issues spotted here.


218-233: Execution-chain and security computation are coherent and metadata‑driven

The combination of build_run_metadata, _build_execution_chain, _build_circuit_slices, get_execution_chain, and _calculate_security is consistent:

  • Slices are ordered by numeric index extracted from slice_* keys.
  • use_circuit is derived from the precomputed ezkl flag plus presence of circuit + keys; no filesystem checks.
  • fallback_map correctly maps either circuit_path -> onnx_path or slice_key -> onnx_path when only ONNX is available.
  • Security is defined as the percentage of slices with ezkl=True, which is simple and transparent.

No correctness problems spotted here.

Also applies to: 236-277, 279-288, 290-295, 297-302

dirs_path = Utils.dirs_root_from(Path(dirs_path))
metadata = Utils.load_run_metadata(run_path)

ezkl = EZKL()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use self.ezkl_runner?

model_path = Utils.resolve_under_slice(slice_dir, model_path)
pk_path = Utils.resolve_under_slice(slice_dir, pk_path)
settings_path = Utils.resolve_under_slice(slice_dir, settings_path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere here I'd call self._prove_single_slice instead of engaging ezkl directly. Feels like we could avoid code repeating and call ezkl.prove in one place, that would make code maintenance a bit easier...


# Convert back to the original packaging if needed
if detected != "dirs":
Converter.convert(str(Utils.dirs_root_from(Path(dirs_model_path))), output_type=detected, cleanup=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to convert back?

…slice directory references and updating slice processing logic. Rename process_slices method and streamline related functions.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
dsperse/src/analyzers/runner_analyzer.py (2)

154-219: Guard EZKL path joins against missing keys to avoid TypeError

In _process_slices_per_slice, the files dict can be present but missing one or more of "compiled_circuit"/"compiled", "settings", "pk_key", or "vk_key" (e.g., pre-compilation, failed compilation, or partial metadata). In that case, calls like:

settings_path = os.path.join(parent_dir, files.get("settings"))

will do os.path.join(..., None) and raise TypeError, contradicting the “analyzer should just analyze what is there” goal.

You can mirror the safer pattern from _process_slices_model:

-            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
-            files = (comp.get("files") or {})
-            compiled_flag = bool(comp.get("compiled", False))
-
-            if files:
-                circuit_path = os.path.join(parent_dir, files.get("compiled_circuit") or files.get("compiled"))
-                settings_path = os.path.join(parent_dir, files.get("settings"))
-                pk_path = os.path.join(parent_dir, files.get("pk_key"))
-                vk_path = os.path.join(parent_dir, files.get("vk_key"))
-            else:
-                circuit_path = None
-                settings_path = None
-                pk_path = None
-                vk_path = None
+            comp = ((slice.get("compilation") or {}).get("ezkl") or {})
+            files = (comp.get("files") or {})
+            compiled_flag = bool(comp.get("compiled", False))
+
+            compiled_rel = files.get("compiled_circuit") or files.get("compiled")
+
+            def _join_optional(key: str) -> Optional[str]:
+                rel = files.get(key)
+                return os.path.join(parent_dir, rel) if rel else None
+
+            circuit_path = os.path.join(parent_dir, compiled_rel) if compiled_rel else None
+            settings_path = _join_optional("settings")
+            pk_path = _join_optional("pk_key")
+            vk_path = _join_optional("vk_key")

This keeps behavior identical when all paths are present, but makes the analyzer resilient to incomplete compilation metadata.


458-492: generate_run_metadata doesn’t normalize slice_path and will crash for str inputs / model roots / dsperse paths

Right now:

  • The signature annotates slice_path: Path, but the __main__ block passes str(model_path).
  • generate_run_metadata immediately calls _has_model_metadata(slice_path), and _has_model_metadata uses path / "metadata.json"; with a str, this raises TypeError.
  • Even with a Path for a model root that only has slices/metadata.json, _has_model_metadata will be True (due to the slices/metadata.json check) but _build_from_model_metadata(slice_path) will then try to load slice_path / "metadata.json", which likely doesn’t exist.

Also, the docstring promises handling of .dslice/.dsperse, but _normalize_to_dirs is never invoked here, so those inputs aren’t actually supported.

You can fix all of this by:

  1. Accepting str | Path.
  2. Normalizing via _normalize_to_dirs to a concrete slices directory and detected packaging format.
  3. Using the normalized slices dir for _has_model_metadata / builders.
  4. Recording the original path in source_path.

For example:

-    @staticmethod
-    def generate_run_metadata(slice_path: Path, save_path=None, original_format=None):
+    @staticmethod
+    def generate_run_metadata(slice_path: str | Path, save_path=None, original_format=None):
@@
-        if RunnerAnalyzer._has_model_metadata(slice_path):
-            run_meta = RunnerAnalyzer._build_from_model_metadata(slice_path)
-        else:
-            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slice_path)
+        # Normalize input to a slices directory and detect original packaging
+        path_obj = Path(slice_path)
+        slices_dir, detected_format = RunnerAnalyzer._normalize_to_dirs(str(path_obj))
+        slices_dir = Path(slices_dir)
+
+        if original_format is None:
+            original_format = detected_format
+
+        if RunnerAnalyzer._has_model_metadata(slices_dir):
+            run_meta = RunnerAnalyzer._build_from_model_metadata(slices_dir)
+        else:
+            run_meta = RunnerAnalyzer._build_from_per_slice_dirs(slices_dir)
@@
-        run_meta["packaging_type"] = original_format
-        run_meta["source_path"] = str(slice_path)
+        run_meta["packaging_type"] = original_format
+        run_meta["source_path"] = str(path_obj.resolve())
@@
-        if save_path is None:
-            base = Path(slice_path).resolve()
+        if save_path is None:
+            base = path_obj.resolve()
             # If slice_path is a model root or slices dir, put run/metadata.json next to it
             base_dir = base if base.is_dir() else base.parent
             save_path = base_dir / "run" / "metadata.json"

This makes the API robust for both str and Path inputs, correctly handles model roots vs slices dirs, and actually enables the promised .dslice/.dsperse support via _normalize_to_dirs.

🧹 Nitpick comments (8)
dsperse/src/analyzers/runner_analyzer.py (8)

9-21: Unused SIZE_LIMIT and no-op __init__ can be simplified

SIZE_LIMIT is currently unused in this file, and RunnerAnalyzer is effectively a static utility with a no-op __init__. Consider either wiring SIZE_LIMIT into circuit selection/security logic or removing it, and dropping __init__ entirely to make the stateless pattern clearer.


42-50: Prefer logger.exception for metadata load failures

The error path in load_slices_metadata logs with logger.error then re-raises. Using logger.exception would preserve the traceback in logs and aligns with the Ruff hint, while keeping the same control flow.

-        except Exception as e:
-            logger.error(f"Failed to load slices metadata from {slices_dir}: {e}")
-            raise
+        except Exception:
+            logger.exception(f"Failed to load slices metadata from {slices_dir}")
+            raise

53-81: process_slices dispatch looks robust; minor note on unused slices_dir downstream

The new dispatcher avoids the old len(...) == 3 heuristic and correctly handles empty input and both per-slice/model-level entries. That’s a good hardening.

Minor: _process_slices_model doesn’t use its slices_dir argument at all; if that’s intentional, you could either remove the parameter or add a brief comment to avoid future confusion.


83-151: Model-level slice processing is reasonable; confirm ezkl/security semantics

The metadata-driven approach here (no filesystem probing, relying on compilation.ezkl.files) looks good.

However, ezkl is just bool(compiled_flag), while _build_execution_chain additionally requires has_circuit and has_keys for use_circuit, and _build_circuit_slices later treats ezkl as the indicator of which slices are “circuit slices” for security. Please confirm that compiled is only set when all required files/keys exist and any size limit has been enforced; otherwise overall_security may over-report coverage relative to what use_circuit actually does.


223-238: build_run_metadata currently bypasses process_slices dispatcher

build_run_metadata unconditionally calls _process_slices_model, even though you introduced process_slices as a more general dispatcher. If build_run_metadata is only ever invoked on model-level slices/metadata.json, this is fine, but consider either:

  • Switching to process_slices for consistency, or
  • Documenting that build_run_metadata expects model-level metadata only.

Right now the #TODO suggests some intent to revisit.


283-300: Align _build_circuit_slices with actual use_circuit semantics

_build_circuit_slices currently uses bool(slice_data.get("ezkl", False)), while _build_execution_chain further gates on has_circuit and has_keys. If ezkl is ever True while keys/artifacts are missing, circuit_slices and overall_security will diverge from what the runner actually uses.

Consider either:

  • Computing a single “ready_for_ezkl” flag in the slice map that already incorporates keys/size checks and using it in both places, or
  • Deriving circuit_slices directly from the execution_chain nodes’ use_circuit.

309-384: _normalize_to_dirs behavior is useful; tighten docstring and clean up minor issues

This helper nicely centralizes detection/normalization of dirs vs dslice vs dsperse and reuses Converter.detect_type/convert. A couple of small points:

  • The docstring says it returns (model_root, slices_dir, original_format), but the implementation returns only (slices_dir, original_format). Either update the docstring or extend the return value so callers aren’t misled.
  • The model_root variable at Line 370 is assigned but never used, and the if ... else is effectively identical. You can drop that whole block or just compute sdir.parent once.
  • Several broad except Exception blocks silently swallow errors (e.g., Lines 335, 344–345, 355). For diagnosability, consider at least logging those before continuing.

These are non-blocking but would improve clarity and debuggability.


386-456: Metadata builders are generally solid; consider tightening exception handling

_build_from_model_metadata and _build_from_per_slice_dirs do the right structural thing: read metadata, construct slice entries, then delegate to the existing chain/security builders.

Two minor notes:

  • The try/except Exception: pass around run_meta["model_path"] is very broad for a low-risk operation; you could just assign without a try, or at least log on failure.
  • Raising a bare Exception with a long message at Line 445 works but makes it harder to distinguish this failure from others programmatically. A small custom exception type (e.g., NoSlicesFoundError) or at least a more specific built‑in would make upstream handling clearer.

None of this blocks functionality but would aid debugging and error classification.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cb0698 and 8a8204d.

📒 Files selected for processing (1)
  • dsperse/src/analyzers/runner_analyzer.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dsperse/src/analyzers/runner_analyzer.py (2)
dsperse/src/utils/utils.py (1)
  • save_metadata_file (18-42)
dsperse/src/slice/utils/converter.py (3)
  • Converter (17-385)
  • detect_type (93-122)
  • convert (20-90)
🪛 Ruff (0.14.7)
dsperse/src/analyzers/runner_analyzer.py

49-49: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


83-83: Unused static method argument: slices_dir

(ARG004)


127-127: Function definition does not bind loop variable slice_dirname

(B023)


154-154: Unused static method argument: slices_dir

(ARG004)


168-168: Do not catch blind exception: Exception

(BLE001)


335-335: Do not catch blind exception: Exception

(BLE001)


344-345: try-except-pass detected, consider logging the exception

(S110)


344-344: Do not catch blind exception: Exception

(BLE001)


355-355: Do not catch blind exception: Exception

(BLE001)


370-370: Local variable model_root is assigned to but never used

Remove assignment to unused variable model_root

(F841)


370-370: Useless if-else condition

(RUF034)


397-398: try-except-pass detected, consider logging the exception

(S110)


397-397: Do not catch blind exception: Exception

(BLE001)


429-429: Do not catch blind exception: Exception

(BLE001)


445-445: Create your own exception

(TRY002)


445-445: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
dsperse/src/analyzers/runner_analyzer.py (2)

241-281: Execution-chain construction looks correct; check behavior for non-numeric slice IDs

The chain logic (ordered by numeric suffix, next pointers, and fallback_map keyed by circuit path or slice ID) is coherent and avoids the earlier off‑by‑one concerns.

One edge case: int(str(k).split('_')[-1]) will raise if any slice keys don’t end with an integer. If that can’t happen in your metadata, no change is needed; otherwise, you may want a defensive fallback (e.g., skip such slices or log and continue).


494-505: Update __main__ demo to pass Path object directly to generate_run_metadata()

The __main__ block currently converts the Path to a string before passing it to generate_run_metadata(). If the method signature has been updated to accept str | Path with internal normalization, remove the str() conversion:

-    out = RunnerAnalyzer.generate_run_metadata(str(model_path))
+    out = RunnerAnalyzer.generate_run_metadata(model_path)

This ensures the demo exercises the same code path that real callers will use.

@0pendansor 0pendansor merged commit f18597b into main Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants