Skip to content

fix(cli-train): drive adapter backends end-to-end (--train-steps wiring + per-backend defaults)#1062

Merged
jayscambler merged 2 commits into
mainfrom
feat/cli-adapter-backends
Jun 9, 2026
Merged

fix(cli-train): drive adapter backends end-to-end (--train-steps wiring + per-backend defaults)#1062
jayscambler merged 2 commits into
mainfrom
feat/cli-adapter-backends

Conversation

@jayscambler

Copy link
Copy Markdown
Contributor

What

Make the recursive loop fully drivable from the CLI for the pretrained-adapter backends (mlxlm/opd/grpo/trl). The CLI, runner, and train.py were already wired to accept these backends — but two from-scratch-tuned defaults silently broke them when driven through autoctx train:

1. --train-steps was never forwarded

TrainingConfig had no train_steps field and the runner never passed --train-steps to the subprocess, so every backend trained at train.py's 8-step default. An 8-step LoRA learns essentially nothing. Now:

  • TrainingConfig.train_steps (default 0 = unset), forwarded by the runner only when > 0.
  • train.py resolves the 0 sentinel per backend via _default_train_steps: 8 for from-scratch (mlx/cuda), 100 for adapter backends.
  • CLI exposes --train-steps.

2. learning_rate=1e-3 diverged LoRA adapters

train.py's default LR is tuned for the from-scratch GPT; it is ~10x too high for a LoRA adapter and diverged it to garbage tokens (in-training assessment avg_score=0). Now train.py resolves an unset (0) LR per backend via _default_learning_rate: 1e-3 from-scratch, 1e-4 mlxlm, 1e-5 opd/grpo/trl — each backend's own tuned rate.

Also: the CLI validates --backend against the known set and rejects negative --train-steps.

Verified live

train.py --backend mlxlm --train-steps 80 on grid_ctf (cached Qwen2.5-0.5B):

  • before (1e-3 default): assessment avg_score=0.0, valid_rate=0.0 — adapter emits !!!! garbage.
  • after (resolved 1e-4): assessment avg_score=0.8654, valid_rate=1.0, 80 steps, 20s.

Tests

Runner forwards --train-steps when set / omits when unset; _default_train_steps and _default_learning_rate resolution per backend. Full training-backend + runner + CLI regression green; ruff + mypy clean. Documents the resolved defaults in mlx-training.md.

Note

Both bugs are the same class the recursive-loop demo work keeps surfacing: a from-scratch-tuned default silently breaking the pretrained-adapter path. The sentinel-default pattern (0 = per-backend default) keeps existing mlx/cuda behavior byte-identical while making the adapter backends correct out of the box.

…ng + per-backend defaults)

The CLI/runner/train.py were already wired for the mlxlm/opd/grpo/trl backends, but two
from-scratch-tuned defaults silently broke them when driven via 'autoctx train':

1. --train-steps was never forwarded CLI -> runner subprocess, so every backend trained at
   train.py's 8-step default. 8 LoRA steps learns ~nothing. Now TrainingConfig carries
   train_steps, the runner forwards it when set, and train.py resolves an unset (0) sentinel
   per backend: 8 from-scratch (mlx/cuda), 100 pretrained-adapter (mlxlm/opd/grpo/trl).

2. train.py's default learning_rate=1e-3 is ~10x too high for a LoRA adapter and DIVERGED it
   to garbage tokens (assessment avg_score=0). It now resolves an unset (0) sentinel per
   backend: 1e-3 from-scratch, 1e-4 mlxlm, 1e-5 opd/grpo/trl (each backend's own tuned rate).

Also: CLI exposes --train-steps and validates --backend against the known set.

Verified live: 'train.py --backend mlxlm --train-steps 80' now trains a healthy adapter
(assessment avg_score 0.865, valid_rate 1.0; was 0.0/garbage at the 1e-3 default).

Tests: runner forwards --train-steps when set / omits when unset; _default_train_steps and
_default_learning_rate resolution per backend. Full training-backend + runner + CLI regression
green. Documents the resolved defaults in mlx-training.md.
mlx-training.md documented --learning-rate as an 'autoctx train' flag, but only --train-steps
was wired -- the CLI rejected --learning-rate with 'No such option'. Wire it through parallel
to --train-steps: TrainingConfig.learning_rate (0 = backend default), runner forwards it when
> 0, CLI exposes + validates it. Now 'autoctx train --backend mlxlm --learning-rate 1e-4'
works and the documented row is accurate.

Tests: runner forwards --learning-rate when set / omits when unset.
@jayscambler jayscambler merged commit f32b286 into main Jun 9, 2026
17 checks passed
@jayscambler jayscambler deleted the feat/cli-adapter-backends branch June 9, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant