Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
f44eb81
teacher env init
J-SUPHA Mar 6, 2026
530fed2
testing set up
J-SUPHA Mar 6, 2026
d5ca760
command change
J-SUPHA Mar 7, 2026
ad364ac
increase timeout cause vllm is super slow all of a sudden
J-SUPHA Mar 8, 2026
985311e
trial
J-SUPHA Mar 8, 2026
e563352
quicker training
J-SUPHA Mar 8, 2026
81f90a6
forgot something easy
J-SUPHA Mar 8, 2026
4f33ab8
apparently not so easy
J-SUPHA Mar 9, 2026
bb2736d
next
J-SUPHA Mar 10, 2026
64794e7
sneaky bug
J-SUPHA Mar 10, 2026
09ad401
sneaky bug logging
J-SUPHA Mar 10, 2026
d1fd89f
non blocking test
J-SUPHA Mar 10, 2026
057c9fe
shorten worker timeout
J-SUPHA Mar 10, 2026
e84686b
remove enforce eager
J-SUPHA Mar 10, 2026
e79af5f
testing config
J-SUPHA Mar 11, 2026
abba562
testing config
J-SUPHA Mar 11, 2026
82be871
testing config
J-SUPHA Mar 11, 2026
98a5d3b
testing config
J-SUPHA Mar 11, 2026
78c0a6d
tokenizer bug
J-SUPHA Mar 11, 2026
f1cfc13
tokenizer bug
J-SUPHA Mar 11, 2026
c275687
tokenizer bug
J-SUPHA Mar 11, 2026
3a440f8
tokenizer bug
J-SUPHA Mar 11, 2026
b457a67
tokenizer bug
J-SUPHA Mar 11, 2026
2f371e0
tokenizer bug
J-SUPHA Mar 11, 2026
8a348be
tokenizer bug
J-SUPHA Mar 11, 2026
34a3936
tokenizer bug
J-SUPHA Mar 12, 2026
fd5b426
tokenizer bug
J-SUPHA Mar 12, 2026
c37516b
tokenizer bug
J-SUPHA Mar 12, 2026
a54dfe7
tokenizer bug
J-SUPHA Mar 12, 2026
62ef2fc
training kernel
J-SUPHA Mar 12, 2026
c26432b
training kernel
J-SUPHA Mar 12, 2026
7ec622a
training ideas
J-SUPHA Mar 12, 2026
a43b0b7
training kernel
J-SUPHA Mar 12, 2026
690e670
investigating weird training issue
J-SUPHA Mar 12, 2026
3df0e45
investigating weird training issue
J-SUPHA Mar 13, 2026
d8857eb
investigating weird training issue
J-SUPHA Mar 13, 2026
d1b0dee
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
600c54f
clean log
J-SUPHA Mar 13, 2026
862cd36
clean logging
J-SUPHA Mar 13, 2026
148a4fd
remove training code
J-SUPHA Mar 13, 2026
a1b545c
remove cross tokenization and fix location of configs
J-SUPHA Mar 13, 2026
994e9c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
322e7e6
remove comments
J-SUPHA Mar 13, 2026
a8cdb53
address problems
J-SUPHA Mar 13, 2026
82964b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
697c594
changes
J-SUPHA Mar 13, 2026
6c56479
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
1b8ff07
adding tests
J-SUPHA Mar 13, 2026
12ba3cc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
a171358
structural changes
J-SUPHA Mar 13, 2026
3a85ede
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 13, 2026
9bd299b
better logging for devex
J-SUPHA Mar 14, 2026
f053c77
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 14, 2026
805a0c0
revert to similar structure
J-SUPHA Mar 14, 2026
7aba0d3
fresh eyes check
J-SUPHA Mar 14, 2026
79baac1
clean
J-SUPHA Mar 17, 2026
41947e9
clean
J-SUPHA Mar 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
"filename": "README.md",
"hashed_secret": "a8253456364f1bfc7da7ae4a1db5b45d106317a5",
"is_verified": false,
"line_number": 454
"line_number": 530
}
],
"SLURM.md": [
Expand Down Expand Up @@ -561,5 +561,5 @@
}
]
},
"generated_at": "2026-03-02T22:46:56Z"
"generated_at": "2026-03-14T00:43:09Z"
}
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,82 @@ curl -s http://localhost:8002/latest_example | jq '{has_ids:(.distill_token_ids!
- Trainers should validate alignment assumptions they require (sequence length, per-position top-k, etc.).
- Teacher-side architecture and prompt/rendering strategy are intentionally out of scope for this PR.

### TeacherDistillationEnv follow-up

The follow-up teacher environment uses a dedicated teacher server config and
attaches teacher prompt logprobs before the group is sent to the API.

Teacher config shape:

```python
TeacherDistillationConfig(
teacher_enabled=True,
teacher_top_k=8,
)
```

Teacher server configs are passed separately at init, just like the primary
`server_configs`:

```python
env = MyTeacherEnv(
config=env_config,
server_configs=student_server_configs,
teacher_server_configs=[
APIServerConfig(
base_url="http://localhost:9003/v1",
model_name="Qwen/Qwen3-30B-A3B-Instruct-2507",
api_key="",
server_type="vllm",
tokenizer_name="Qwen/Qwen3-30B-A3B-Instruct-2507",
)
],
)
```

You can either:

- build a teacher-enabled env by mixing `TeacherDistillationEnv` into an existing
`BaseEnv`-derived env such as `GSM8kEnv`, or
- subclass `TeacherDistillationEnv` directly and implement the usual environment
methods yourself.

In both cases, `TeacherDistillationEnv` still assumes the normal `BaseEnv`
runtime contract: tokenized rollouts, `ScoredDataGroup` payloads, and the
standard `handle_send_to_api(...)` transport path.

CLI shape:

```bash
--env.teacher_enabled true \
--teacher.base_url "http://localhost:9003/v1" \
--teacher.model_name "Qwen/Qwen3-30B-A3B-Instruct-2507" \
--teacher.server_type vllm \
--env.teacher_top_k 8
```

If `--teacher.model_name` is a deployment alias rather than a tokenizer
identifier, also set `--teacher.tokenizer_name ...` so the env can validate
tokenizer compatibility.

Scope note:

- The teacher-aware CLI wiring currently exists for `serve`.
- If `teacher_enabled=True`, the generic `process` and `evaluate` commands will
fail loudly at env construction time unless you instantiate the env yourself
and pass `teacher_server_configs=...`.

Tokenizer requirement:

- Teacher distillation currently requires the teacher and student to use the same tokenizer vocabulary.
- If the tokenizers do not match, `TeacherDistillationEnv` raises an error instead of attempting token conversion.

Why same-tokenizer is required:

- `distill_token_ids` are consumed as student-vocabulary IDs by the trainer.
- If the teacher uses a different vocabulary, the same integer token ID refers to different text on the teacher and student sides.
- A decode/re-tokenize/remap pipeline is not a safe drop-in fix because it changes both token positions and token identities, which breaks the exact per-position token supervision that the current distillation loss assumes.

---

## Testing and Debugging Tools
Expand Down
22 changes: 7 additions & 15 deletions atroposlib/envs/server_handling/openai_server.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this may need to be reverted?

Original file line number Diff line number Diff line change
Expand Up @@ -199,18 +199,14 @@ def resolve_openai_configs(
f"Error parsing multi-server OpenAI configuration from YAML under '{OPENAI_NAMESPACE}': {e}"
) from e
elif isinstance(default_server_configs, APIServerConfig):
# Check APIServerConfig BEFORE ServerBaseline since APIServerConfig inherits from ServerBaseline
logger.info(
"Using single OpenAI server configuration based on merged settings (default/YAML/CLI)."
)
logger.info("Using single OpenAI server configuration.")
try:
final_openai_config = APIServerConfig(**openai_config_dict)
except Exception as e:
raise FailedExecutionException(
f"Error creating final OpenAI configuration from merged settings: {e}\n"
f"Merged Dict: {openai_config_dict}"
f"Error creating final OpenAI configuration: {e}"
) from e
server_configs = final_openai_config
server_configs = [final_openai_config]
elif isinstance(default_server_configs, ServerBaseline):
# Pure ServerBaseline (not APIServerConfig) - no CLI overrides possible
logger.info("Using ServerBaseline configuration.")
Expand All @@ -219,26 +215,22 @@ def resolve_openai_configs(
logger.info("Using default multi-server configuration (length >= 2).")
server_configs = default_server_configs
else:
logger.info(
"Using single OpenAI server configuration based on merged settings (default/YAML/CLI)."
)
logger.info("Using single OpenAI server configuration.")
try:
final_openai_config = APIServerConfig(**openai_config_dict)
except Exception as e:
raise FailedExecutionException(
f"Error creating final OpenAI configuration from merged settings: {e}\n"
f"Merged Dict: {openai_config_dict}"
f"Error creating final OpenAI configuration: {e}"
) from e

if isinstance(default_server_configs, APIServerConfig):
server_configs = final_openai_config
server_configs = [final_openai_config]
elif isinstance(default_server_configs, list):
server_configs = [final_openai_config]
else:
logger.warning(
f"Unexpected type for default_server_configs: {type(default_server_configs)}. "
f"Proceeding with single OpenAI server configuration based on merged settings."
"Proceeding with single OpenAI server configuration."
)
server_configs = [final_openai_config]

return server_configs
13 changes: 5 additions & 8 deletions atroposlib/envs/server_handling/vllm_server.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ async def _get_logprobs_wrapper(self, **kwargs) -> Dict[str, Any]:
), "Prompt or input_ids is required for get_logprobs!"

top_k = int(kwargs.pop("top_k", kwargs.pop("top_logprobs", 1)))
top_k = max(1, top_k)
top_k = max(0, top_k)

# Use input_ids if provided (from ManagedServer), otherwise tokenize prompt
from_prompt_text = False
Expand Down Expand Up @@ -408,25 +408,22 @@ def resolve_openai_configs(
logger.info("Using default multi-server configuration (length >= 2).")
server_configs = default_server_configs
else:
logger.info(
"Using single OpenAI server configuration based on merged settings (default/YAML/CLI)."
)
logger.info("Using single OpenAI server configuration.")
try:
final_openai_config = APIServerConfig(**openai_config_dict)
except Exception as e:
raise FailedExecutionException(
f"Error creating final OpenAI configuration from merged settings: {e}\n"
f"Merged Dict: {openai_config_dict}"
f"Error creating final OpenAI configuration: {e}"
) from e

if isinstance(default_server_configs, APIServerConfig):
server_configs = final_openai_config
server_configs = [final_openai_config]
elif isinstance(default_server_configs, list):
server_configs = [final_openai_config]
else:
logger.warning(
f"Unexpected type for default_server_configs: {type(default_server_configs)}. "
f"Proceeding with single OpenAI server configuration based on merged settings."
"Proceeding with single OpenAI server configuration."
)
server_configs = [final_openai_config]

Expand Down
Loading
Loading