chore: refactor env group config into Env wrapper classes#2193
Draft
mikasenghaas wants to merge 13 commits intomainfrom
Draft
chore: refactor env group config into Env wrapper classes#2193mikasenghaas wants to merge 13 commits intomainfrom
mikasenghaas wants to merge 13 commits intomainfrom
Conversation
…field - Remove ValConfig and all val-related orchestrator code (not part of env group refactor) - Move env_ratios from BufferConfig to per-env ratio field on EnvConfig - Fix merge artifacts: train_env_names -> train_envs.names, env_cfg.server.address -> env_cfg.address - Add validator ensuring env ratios are all-or-nothing across envs - Update tests and example configs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Extract spawn/connect/shutdown into Envs class with atexit cleanup - Auto-detect group rubric envs and use run_group instead of deferred scoring - Remove deferred_group_scoring_tasks, max_concurrent from EnvConfig - Fix stale eval references (eval_env_set, eval_env_names, config.eval.num_examples) - Use Sequence[EnvConfig] for covariant type compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Env wraps vf.Environment + config with per-env state (group scoring, max_retries, ratio, spawn/connect/shutdown, generate_rollout/generate_group) - EvalEnv extends Env with num_examples, rollouts_per_example, and evaluate() - Envs is now a thin container over Env instances - Scheduler uses env.uses_group_scoring and env.generate_* directly instead of maintaining parallel dicts/sets - Buffer iterates Env instances directly for datasets and ratios - evaluate_and_log replaces evaluate_env, taking an EvalEnv directly - Tests use make_env/make_envs helpers for cleaner Env construction Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- sampling_args set once at env init via Envs.set_sampling_args() - Remove per-step compute_temperature + set_sampling_args from training loop - get_sampling_args no longer takes temperature param - Add TrainEnv/EvalEnv subclasses with distinct get_dataset behavior - EvalEnv.get_dataset calls get_eval_dataset, base Env uses get_dataset - Buffer no longer needs eval flag — uses env.get_dataset() polymorphically - Raise on connect if address not configured Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- TrainEnvs takes SamplingConfig and computes train sampling args once - EvalEnvs takes EvalSamplingConfig and computes eval sampling args once - Remove set_sampling_args, get_eval_sampling_args calls from orchestrator loop - evaluate_and_log uses eval_env.sampling_args directly - Access max_retries and ratio through config instead of duplicated attrs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Inline run_rollout/run_group logic into Env.generate_rollout/generate_group - Make uses_group_scoring a property on Env (reads rubric directly) - Rename get_sampling_args -> get_train_sampling_args - Move get_eval_sampling_args from eval_utils to utils (next to train variant) - Rename vf_env -> _env Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…_group - Remove sampling config from TrainEnv/EvalEnv constructors - Orchestrator computes sampling args and sets via Envs.set_sampling_args() - Rename generate_rollout -> run_rollout, generate_group -> run_group Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Remove dead vf_utils functions: run_rollout, spawn_env_server, setup_env_client, wait_for_env_servers, task_uses_group_scoring - Inline spawn logic (ZMQEnvServer.run_server) directly into Env.spawn() - Inline connect logic (ZMQEnvClient) directly into Env.connect() - Convert name and uses_group_scoring from properties to attributes - Clean up unused imports from vf_utils Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Env,TrainEnv,EvalEnvwrapper classes that encapsulate per-env state (config, spawn/connect/shutdown, rollout generation, sampling args, group scoring detection)TrainEnvs/EvalEnvscontainers replace the oldEnvsclass with typed constructorsbuffer.env_ratiosto per-envratiofield onEnvConfigrun_groupinstead of deferred group scoringValConfigand validation loop (separate concern, not part of env groups)spawn_env_server,setup_env_client,run_rollout,task_uses_group_scoringintoEnvmethods, cleaning ~130 lines fromvf_utilsBreaking changes
buffer.env_ratiosconfig field removed — use per-envratiofield insteadvalconfig section removed🤖 Generated with Claude Code