Skip to content

lemond model downloader hangs at 0 bytes on Xet-backed HF repos (no error, no progress) #2416

Description

@The-Monk

Summary

lemond's model downloader hangs indefinitely (0 bytes, no error) when pulling a model whose HuggingFace repo is stored on the Xet storage backend. The repository metadata is fetched successfully, the file list is resolved, but the actual blob download never transfers a single byte and never errors out — it just stalls until the client disconnects.

Environment

  • Lemonade: lemonade-server 10.8.1~24.04 (PPA lemonade-team/stable)
  • OS: Zorin OS 18.1 (Ubuntu 24.04 / noble)
  • GPU: 2× AMD Radeon AI PRO R9700 (gfx1201, RDNA4)
  • Backend: llamacpp (rocm); HF cache at /ml/huggingface-cache/huggingface/hub/

Reproduction

lemonade pull Qwen-AgentWorld-35B-A3B-GGUF-UD-Q4_K_XL

(repo: unsloth/Qwen-AgentWorld-35B-A3B-GGUF — public, not gated, Xet-backed)

Or directly against the server:

curl -X POST http://127.0.0.1:13305/api/v1/pull \
  -H 'Content-Type: application/json' \
  -d '{"model_name":"Qwen-AgentWorld-35B-A3B-GGUF-UD-Q4_K_XL"}'
# -> HTTP 000 after 20s (no response; nothing written to disk)

Server log (lemond.service)

(ModelManager) Downloading model: unsloth/Qwen-AgentWorld-35B-A3B-GGUF (variant: UD-Q4_K_XL)
(ModelManager) Fetching repository file list from Hugging Face...
(ModelManager) Using commit hash: 3a305abf5cfd119ee999dfe929c433746edd8d63
(ModelManager) Repository contains 25 files
(ModelManager) Identified files to download:
(ModelManager)   - ...:Qwen-AgentWorld-35B-A3B-UD-Q4_K_XL.gguf
(ModelManager)   - ...:mmproj-F16.gguf
(ModelManager) Created download manifest
(ModelManager) Downloading: Qwen-AgentWorld-35B-A3B-UD-Q4_K_XL.gguf...
        <-- hangs here forever, 0 bytes, no progress, no error -->
(Server) Client disconnected, cancelling download

Evidence it's Xet-specific (not a bad URL / network issue)

The plain HTTPS path works perfectly from the same host:

# resolve URL 302 -> us.aws.cdn.hf.co (xet-bridge), x-linked-size: 22324804864
curl -L -r 0-1048575 .../resolve/main/Qwen-AgentWorld-35B-A3B-UD-Q4_K_XL.gguf
# -> HTTP 206, 1048576 bytes in 0.44s

The repo carries Xet headers (x-xet-hash, link: ...rel="xet-reconstruction-info"cas-server.xethub.hf.co). Non-Xet repos (e.g. older GGUFs) download fine through lemond. Downloading the same two files with huggingface_hub + HF_HUB_DISABLE_XET=1 succeeds over plain HTTPS into the same cache.

What I tried

Setting HF_HUB_DISABLE_XET=1 and HF_XET_DISABLE=1 in the lemond service environment (/etc/lemonade/conf.d/) and restarting — no change, the downloader still stalls. So lemond's downloader does not appear to honor the standard HF Xet opt-out env vars.

Expected behavior

lemond should either (a) correctly complete Xet-backed downloads, or (b) fall back to the plain HTTPS CDN path when Xet reconstruction stalls, and in any case time out with an error instead of hanging silently at 0 bytes.

Suggestions

  • Honor HF_HUB_DISABLE_XET / HF_XET_DISABLE as an escape hatch.
  • Add a stall/timeout on the blob transfer so a hung Xet reconstruction surfaces an error rather than hanging.
  • Given the growing number of Unsloth/HF repos migrating to Xet, this will affect many popular GGUF models.

Happy to provide full debug logs or test a build on gfx1201 hardware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area::clilemonade CLI client (src/cpp/cli)bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions