Summary
Build serve_prm as the scoring service for branch prefixes. The interface should be usable from both offline eval and a future live forest reranker.
Scope
- Load a trained PRM checkpoint and expose batch prefix scoring.
- Support a clean API contract for
branch_id, prefix, reward, rank_score, and optional rationale.
- Start with a reliable torch/transformers path; add vLLM-backed tokenization / generation integration where it makes sense, but do not force custom reward heads into a pure vLLM-only design if that makes the system brittle.
- Add a small client usable from eval scripts and future controller/runtime work.
- Add local smoke tests against saved prefixes from extracted traces.
Modules to build
project/evmbench/evmbench/experiments/serve_prm.py
project/evmbench/evmbench/experiments/prm_service.py
project/evmbench/evmbench/experiments/prm_api.py
project/evmbench/evmbench/experiments/prm_client.py
project/evmbench/evmbench/experiments/vllm_backend.py
Acceptance criteria
- Given saved prefix inputs, the service returns deterministic scores with the expected schema.
- Batch scoring works for multiple live branches.
- There is a documented backend story for custom reward heads vs vLLM generation.
Summary
Build
serve_prmas the scoring service for branch prefixes. The interface should be usable from both offline eval and a future live forest reranker.Scope
branch_id,prefix,reward,rank_score, and optional rationale.Modules to build
project/evmbench/evmbench/experiments/serve_prm.pyproject/evmbench/evmbench/experiments/prm_service.pyproject/evmbench/evmbench/experiments/prm_api.pyproject/evmbench/evmbench/experiments/prm_client.pyproject/evmbench/evmbench/experiments/vllm_backend.pyAcceptance criteria