Skip to content

Pull requests: radixark/miles

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Enable observe training entropy without computing entropy loss
#1464 opened Jun 22, 2026 by zyzshishui Contributor Loading…
Add opt-in periodic py-spy dumper for hang debugging
#1461 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add FT with-failure e2e scenarios
#1459 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add FT deterministic e2e scenarios
#1458 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add FT no-failure e2e scenarios
#1457 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add FT e2e test framework (conftest_ft harness)
#1456 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add debug-exit-after-rollout to train entrypoints
#1455 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Always save rollout debug data regardless of rollout_global_dataset
#1454 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Start HTTP control server and mini FT controller in train entrypoints
#1453 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add HTTP control server for cell suspend/resume and fault injection
#1452 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add mini FT controller
#1451 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Wire FT event logging and component gating into RolloutManager
#1450 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add CI rollout-data injection with recorded-data metadata round-trip
#1449 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Add FT test-action hooks to the train group
#1448 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Log train-group step-end and analysis events
#1447 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Inject witness ids into the Megatron forward and train step
#1446 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Bracket Megatron actor methods with the with_logs decorator
#1445 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Track rollout-engine connection staleness on the weight updater
#1444 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Kill failed cells immediately on execute failure
#1443 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
Wire per-cell heartbeat health monitoring into the train group
#1442 opened Jun 22, 2026 by fzyzcjy Collaborator Loading…
ProTip! Filter pull requests by the default branch with base:main.