Skip to content

Conversation

@prashantcraju
Copy link

Summary

  • align the task dependency simulator’s stats aggregation with the refactored metta/rl/stats.py, summing per_label_samples/evictions/tracked_task_completions instead of averaging
  • confirm the simulator still logs identical WandB telemetry (mean performance, tasks above threshold, sampling gini/entropy, dependency waterfall)

Testing

uv run ./tools/run.py recipes.experiment.curriculum_test.task_dependency_simulator.train
num_epochs=2 samples_per_epoch=5 run=test_refactor_stats

uv run ./tools/run.py recipes.experiment.curriculum_test.task_dependency_simulator.train
num_epochs=500 samples_per_epoch=10 num_envs=32 run=integration_test- wandb proof: https://wandb.ai/metta-research/curriculum_test/runs/d6ozn564

Evidence

  • Gini_report.pdf

  • Combined plot showing mean performance, tasks above threshold, sampling gini/entropy, eviction counts, and dependency waterfall

Unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants