Skip to content

[MVP LLaDA2.0] Phase 9: Output correctness validation #39

@AlonKellner-RedHat

Description

@AlonKellner-RedHat

Sub-Phase Issues

Phase 9 is split into two detailed sub-phases:

  • #42 - Phase 9.1: Numerical Validation (Incremental Layer-by-Layer)
  • #43 - Phase 9.2: E2E Evaluation Validation (lm-eval Integration)

This issue serves as the high-level orchestration and tracking parent for both sub-phases.


Context

Validate that LLaDA2.0 plugin outputs match reference implementations to ensure functional correctness beyond performance benchmarking.

Relationship: Builds on Phase 8 benchmarks (PR #38). Validation strategy: numerical → E2E → cross-implementation.

References:

Deliverables

Acceptance Criteria

  • Phase 9.1 complete: All 8 validation points pass, tolerance bounds documented, router precision validated
  • Phase 9.2 complete: All 5 lm-eval tasks evaluated, results within tolerance (±1-2% categorical, ±2-3% generation)
  • Results match SGlang LLaDA2.0 within documented tolerance
  • Results match HuggingFace within documented tolerance (expect both to be approximately equal)
  • Any discrepancies documented with root cause analysis

Out of Scope

  • Performance optimization (Phase 8)
  • Novel evaluation tasks beyond lm-eval standards

Dependencies

Implementation Contract

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions