Skip to content

chore: refactor env group config into Env wrapper classes#2193

Draft
mikasenghaas wants to merge 13 commits intomainfrom
chore/env-group-refactor
Draft

chore: refactor env group config into Env wrapper classes#2193
mikasenghaas wants to merge 13 commits intomainfrom
chore/env-group-refactor

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

Summary

  • Introduces Env, TrainEnv, EvalEnv wrapper classes that encapsulate per-env state (config, spawn/connect/shutdown, rollout generation, sampling args, group scoring detection)
  • TrainEnvs / EvalEnvs containers replace the old Envs class with typed constructors
  • Flattens env group config: moves buffer.env_ratios to per-env ratio field on EnvConfig
  • Auto-detects group rubric envs and uses run_group instead of deferred group scoring
  • Removes temperature scheduling from the orchestrator loop (sampling args computed once at init)
  • Removes ValConfig and validation loop (separate concern, not part of env groups)
  • Inlines spawn_env_server, setup_env_client, run_rollout, task_uses_group_scoring into Env methods, cleaning ~130 lines from vf_utils
  • Orchestrator env setup reduced from ~100 lines to ~20 lines

Breaking changes

  • buffer.env_ratios config field removed — use per-env ratio field instead
  • val config section removed
  • Temperature scheduling removed from orchestrator

🤖 Generated with Claude Code

mikasenghaas and others added 13 commits March 31, 2026 11:13
…field

- Remove ValConfig and all val-related orchestrator code (not part of env group refactor)
- Move env_ratios from BufferConfig to per-env ratio field on EnvConfig
- Fix merge artifacts: train_env_names -> train_envs.names, env_cfg.server.address -> env_cfg.address
- Add validator ensuring env ratios are all-or-nothing across envs
- Update tests and example configs

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Extract spawn/connect/shutdown into Envs class with atexit cleanup
- Auto-detect group rubric envs and use run_group instead of deferred scoring
- Remove deferred_group_scoring_tasks, max_concurrent from EnvConfig
- Fix stale eval references (eval_env_set, eval_env_names, config.eval.num_examples)
- Use Sequence[EnvConfig] for covariant type compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Env wraps vf.Environment + config with per-env state (group scoring,
  max_retries, ratio, spawn/connect/shutdown, generate_rollout/generate_group)
- EvalEnv extends Env with num_examples, rollouts_per_example, and evaluate()
- Envs is now a thin container over Env instances
- Scheduler uses env.uses_group_scoring and env.generate_* directly instead
  of maintaining parallel dicts/sets
- Buffer iterates Env instances directly for datasets and ratios
- evaluate_and_log replaces evaluate_env, taking an EvalEnv directly
- Tests use make_env/make_envs helpers for cleaner Env construction

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- sampling_args set once at env init via Envs.set_sampling_args()
- Remove per-step compute_temperature + set_sampling_args from training loop
- get_sampling_args no longer takes temperature param
- Add TrainEnv/EvalEnv subclasses with distinct get_dataset behavior
- EvalEnv.get_dataset calls get_eval_dataset, base Env uses get_dataset
- Buffer no longer needs eval flag — uses env.get_dataset() polymorphically
- Raise on connect if address not configured

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- TrainEnvs takes SamplingConfig and computes train sampling args once
- EvalEnvs takes EvalSamplingConfig and computes eval sampling args once
- Remove set_sampling_args, get_eval_sampling_args calls from orchestrator loop
- evaluate_and_log uses eval_env.sampling_args directly
- Access max_retries and ratio through config instead of duplicated attrs

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Inline run_rollout/run_group logic into Env.generate_rollout/generate_group
- Make uses_group_scoring a property on Env (reads rubric directly)
- Rename get_sampling_args -> get_train_sampling_args
- Move get_eval_sampling_args from eval_utils to utils (next to train variant)
- Rename vf_env -> _env

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…_group

- Remove sampling config from TrainEnv/EvalEnv constructors
- Orchestrator computes sampling args and sets via Envs.set_sampling_args()
- Rename generate_rollout -> run_rollout, generate_group -> run_group

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Remove dead vf_utils functions: run_rollout, spawn_env_server,
  setup_env_client, wait_for_env_servers, task_uses_group_scoring
- Inline spawn logic (ZMQEnvServer.run_server) directly into Env.spawn()
- Inline connect logic (ZMQEnvClient) directly into Env.connect()
- Convert name and uses_group_scoring from properties to attributes
- Clean up unused imports from vf_utils

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant