There is an issue with the Dockerfile using /TorchSpec/docker/sglang/v0.5.10.post1/Dockerfile. Upon checking, the actual version of mooncake-transfer-engine in the container is 0.3.10.post2, but the error still occurs:
(SglEngine pid=12651) [2026-05-14 07:19:44] Disable piecewise CUDA graph because the capture size is not set
(SglEngine pid=12651) [2026-05-14 07:19:44] Scheduler hit an exception: Traceback (most recent call last):
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3653, in run_scheduler_process
(SglEngine pid=12651) scheduler = Scheduler(
(SglEngine pid=12651) ^^^^^^^^^^
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 459, in init
(SglEngine pid=12651) raise self._mooncake_init_error
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 395, in _init_mooncake
(SglEngine pid=12651) self.init_eagle_mooncake_store(device=mooncake_device)
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 898, in init_eagle_mooncake_store
(SglEngine pid=12651) store.setup(device=device or self.device)
(SglEngine pid=12651) File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 98, in setup
(SglEngine pid=12651) self._verify_force_delete()
(SglEngine pid=12651) File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 252, in _verify_force_delete
(SglEngine pid=12651) raise RuntimeError(
(SglEngine pid=12651) RuntimeError: Mooncake version too old: batch_remove() not found. Requires mooncake-transfer-engine >= 0.3.10.post1.
The Dockerfile used in /TorchSpec/docker/sglang/v0.5.8.post1/Dockerfile is incorrect. The problem is as follows:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/root/torchspec/torchspec/train_entry.py", line 367, in
train_async_no_generation(args)
File "/root/torchspec/torchspec/train_entry.py", line 333, in train_async_no_generation
all_results = timer.wait("Actor initialization", train_init_refs + engine_init_refs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/torchspec/torchspec/train_entry.py", line 87, in wait
result = ray.get(refs)
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 107, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2980, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1023, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::SglEngine.init() (pid=12497, ip=10.90.91.146, actor_id=a40d25a9423bd0735314071b01000000, repr=<torchspec.inference.engine.sgl_engine.SglEngine object at 0x7f22ae0b13a0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/torchspec/torchspec/inference/engine/sgl_engine.py", line 292, in init
self._engine = sgl.Engine(**engine_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/utils.py", line 325, in call
return module(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 152, in init
server_args = self.server_args_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ServerArgs.init() got an unexpected keyword argument 'spec_training_store_last_hidden_states'
There is an issue with the Dockerfile using /TorchSpec/docker/sglang/v0.5.10.post1/Dockerfile. Upon checking, the actual version of mooncake-transfer-engine in the container is 0.3.10.post2, but the error still occurs:
(SglEngine pid=12651) [2026-05-14 07:19:44] Disable piecewise CUDA graph because the capture size is not set
(SglEngine pid=12651) [2026-05-14 07:19:44] Scheduler hit an exception: Traceback (most recent call last):
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3653, in run_scheduler_process
(SglEngine pid=12651) scheduler = Scheduler(
(SglEngine pid=12651) ^^^^^^^^^^
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 459, in init
(SglEngine pid=12651) raise self._mooncake_init_error
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 395, in _init_mooncake
(SglEngine pid=12651) self.init_eagle_mooncake_store(device=mooncake_device)
(SglEngine pid=12651) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 898, in init_eagle_mooncake_store
(SglEngine pid=12651) store.setup(device=device or self.device)
(SglEngine pid=12651) File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 98, in setup
(SglEngine pid=12651) self._verify_force_delete()
(SglEngine pid=12651) File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 252, in _verify_force_delete
(SglEngine pid=12651) raise RuntimeError(
(SglEngine pid=12651) RuntimeError: Mooncake version too old: batch_remove() not found. Requires mooncake-transfer-engine >= 0.3.10.post1.
The Dockerfile used in /TorchSpec/docker/sglang/v0.5.8.post1/Dockerfile is incorrect. The problem is as follows:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/root/torchspec/torchspec/train_entry.py", line 367, in
train_async_no_generation(args)
File "/root/torchspec/torchspec/train_entry.py", line 333, in train_async_no_generation
all_results = timer.wait("Actor initialization", train_init_refs + engine_init_refs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/torchspec/torchspec/train_entry.py", line 87, in wait
result = ray.get(refs)
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 107, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2980, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1023, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::SglEngine.init() (pid=12497, ip=10.90.91.146, actor_id=a40d25a9423bd0735314071b01000000, repr=<torchspec.inference.engine.sgl_engine.SglEngine object at 0x7f22ae0b13a0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/torchspec/torchspec/inference/engine/sgl_engine.py", line 292, in init
self._engine = sgl.Engine(**engine_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/utils.py", line 325, in call
return module(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 152, in init
server_args = self.server_args_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ServerArgs.init() got an unexpected keyword argument 'spec_training_store_last_hidden_states'