Running dflash with SGLang's Docker encounters issues.

There is an issue with the Dockerfile using /TorchSpec/docker/sglang/v0.5.10.post1/Dockerfile. Upon checking, the actual version of mooncake-transfer-engine in the container is 0.3.10.post2, but the error still occurs：
(SglEngine pid=12651) [2026-05-14 07:19:44] Disable piecewise CUDA graph because the capture size is not set
(SglEngine pid=12651) [2026-05-14 07:19:44] Scheduler hit an exception: Traceback (most recent call last):
(SglEngine pid=12651)   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3653, in run_scheduler_process
(SglEngine pid=12651)     scheduler = Scheduler(
(SglEngine pid=12651)                 ^^^^^^^^^^
(SglEngine pid=12651)   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 459, in __init__
(SglEngine pid=12651)     raise self._mooncake_init_error
(SglEngine pid=12651)   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 395, in _init_mooncake
(SglEngine pid=12651)     self.init_eagle_mooncake_store(device=mooncake_device)
(SglEngine pid=12651)   File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 898, in init_eagle_mooncake_store
(SglEngine pid=12651)     store.setup(device=device or self.device)
(SglEngine pid=12651)   File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 98, in setup
(SglEngine pid=12651)     self._verify_force_delete()
(SglEngine pid=12651)   File "/root/torchspec/torchspec/transfer/mooncake/store.py", line 252, in _verify_force_delete
(SglEngine pid=12651)     raise RuntimeError(
(SglEngine pid=12651) RuntimeError: Mooncake version too old: batch_remove() not found. Requires mooncake-transfer-engine >= 0.3.10.post1.



The Dockerfile used in /TorchSpec/docker/sglang/v0.5.8.post1/Dockerfile is incorrect. The problem is as follows:
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/torchspec/torchspec/train_entry.py", line 367, in <module>
    train_async_no_generation(args)
  File "/root/torchspec/torchspec/train_entry.py", line 333, in train_async_no_generation
    all_results = timer.wait("Actor initialization", train_init_refs + engine_init_refs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/torchspec/torchspec/train_entry.py", line 87, in wait
    result = ray.get(refs)
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 107, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2980, in get
    values, debugger_breakpoint = worker.get_objects(
                                  ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1023, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::SglEngine.init() (pid=12497, ip=10.90.91.146, actor_id=a40d25a9423bd0735314071b01000000, repr=<torchspec.inference.engine.sgl_engine.SglEngine object at 0x7f22ae0b13a0>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/torchspec/torchspec/inference/engine/sgl_engine.py", line 292, in init
    self._engine = sgl.Engine(**engine_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/utils.py", line 325, in __call__
    return module(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 152, in __init__
    server_args = self.server_args_class(**kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ServerArgs.__init__() got an unexpected keyword argument 'spec_training_store_last_hidden_states'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running dflash with SGLang's Docker encounters issues. #99

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running dflash with SGLang's Docker encounters issues. #99

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions