Skip to content

Conversation

@KeeProMise
Copy link

@KeeProMise KeeProMise commented Dec 8, 2025

see: #7709

Problem

When loading model weights across multiple devices (tensor parallel), buffers registered via register_buffer were only being loaded on device 0, or not loaded at all in inference scenarios.

Root Causes

  1. load_buffer() function lacked device awareness (no mp_group parameter)
  2. Inference Engine's load_model_with_checkpoint() completely ignored buffers
  3. Inconsistent buffer loading across different code paths

Solution

  • Enhanced load_buffer() to accept mp_group parameter and handle device migration
  • Updated all load_buffer() call sites to pass mp_group parameter
  • Added buffer loading logic to Inference Engine's load_model_with_checkpoint()

Files Changed

  • deepspeed/module_inject/auto_tp.py
  • deepspeed/module_inject/replace_module.py
  • deepspeed/inference/engine.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants