chore: refactor env group config into Env wrapper classes by mikasenghaas · Pull Request #2193 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-04-03T13:55:57Z

Summary

Introduces Env, TrainEnv, EvalEnv wrapper classes that encapsulate per-env state (config, spawn/connect/shutdown, rollout generation, sampling args, group scoring detection)
TrainEnvs / EvalEnvs containers replace the old Envs class with typed constructors
Flattens env group config: moves buffer.env_ratios to per-env ratio field on EnvConfig
Auto-detects group rubric envs and uses run_group instead of deferred group scoring
Removes temperature scheduling from the orchestrator loop (sampling args computed once at init)
Removes ValConfig and validation loop (separate concern, not part of env groups)
Inlines spawn_env_server, setup_env_client, run_rollout, task_uses_group_scoring into Env methods, cleaning ~130 lines from vf_utils
Orchestrator env setup reduced from ~100 lines to ~20 lines

Breaking changes

buffer.env_ratios config field removed — use per-env ratio field instead
val config section removed
Temperature scheduling removed from orchestrator

🤖 Generated with Claude Code

…field - Remove ValConfig and all val-related orchestrator code (not part of env group refactor) - Move env_ratios from BufferConfig to per-env ratio field on EnvConfig - Fix merge artifacts: train_env_names -> train_envs.names, env_cfg.server.address -> env_cfg.address - Add validator ensuring env ratios are all-or-nothing across envs - Update tests and example configs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Extract spawn/connect/shutdown into Envs class with atexit cleanup - Auto-detect group rubric envs and use run_group instead of deferred scoring - Remove deferred_group_scoring_tasks, max_concurrent from EnvConfig - Fix stale eval references (eval_env_set, eval_env_names, config.eval.num_examples) - Use Sequence[EnvConfig] for covariant type compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Env wraps vf.Environment + config with per-env state (group scoring, max_retries, ratio, spawn/connect/shutdown, generate_rollout/generate_group) - EvalEnv extends Env with num_examples, rollouts_per_example, and evaluate() - Envs is now a thin container over Env instances - Scheduler uses env.uses_group_scoring and env.generate_* directly instead of maintaining parallel dicts/sets - Buffer iterates Env instances directly for datasets and ratios - evaluate_and_log replaces evaluate_env, taking an EvalEnv directly - Tests use make_env/make_envs helpers for cleaner Env construction Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- sampling_args set once at env init via Envs.set_sampling_args() - Remove per-step compute_temperature + set_sampling_args from training loop - get_sampling_args no longer takes temperature param - Add TrainEnv/EvalEnv subclasses with distinct get_dataset behavior - EvalEnv.get_dataset calls get_eval_dataset, base Env uses get_dataset - Buffer no longer needs eval flag — uses env.get_dataset() polymorphically - Raise on connect if address not configured Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- TrainEnvs takes SamplingConfig and computes train sampling args once - EvalEnvs takes EvalSamplingConfig and computes eval sampling args once - Remove set_sampling_args, get_eval_sampling_args calls from orchestrator loop - evaluate_and_log uses eval_env.sampling_args directly - Access max_retries and ratio through config instead of duplicated attrs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Inline run_rollout/run_group logic into Env.generate_rollout/generate_group - Make uses_group_scoring a property on Env (reads rubric directly) - Rename get_sampling_args -> get_train_sampling_args - Move get_eval_sampling_args from eval_utils to utils (next to train variant) - Rename vf_env -> _env Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…_group - Remove sampling config from TrainEnv/EvalEnv constructors - Orchestrator computes sampling args and sets via Envs.set_sampling_args() - Rename generate_rollout -> run_rollout, generate_group -> run_group Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Remove dead vf_utils functions: run_rollout, spawn_env_server, setup_env_client, wait_for_env_servers, task_uses_group_scoring - Inline spawn logic (ZMQEnvServer.run_server) directly into Env.spawn() - Inline connect logic (ZMQEnvClient) directly into Env.connect() - Convert name and uses_group_scoring from properties to attributes - Clean up unused imports from vf_utils Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

mikasenghaas and others added 13 commits March 31, 2026 11:13

ckpt

5a2da30

Merge branch 'main' into chore/env-group-refactor

9761e0b

chore: type-narrow TrainEnv constructor to TrainEnvConfig

18b8336

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chore: revert math_group config changes

40e444a

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chore: remove commented-out TrainEnvConfig stub

53508bd

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: refactor env group config into Env wrapper classes#2193

chore: refactor env group config into Env wrapper classes#2193
mikasenghaas wants to merge 13 commits intomainfrom
chore/env-group-refactor

mikasenghaas commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Apr 3, 2026

Summary

Breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant