fix: When there are tensors registered with register buffer in the weight file, the weights are only loaded on device 0 when loading weights across multiple devices. #7717
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see: #7709
Problem
When loading model weights across multiple devices (tensor parallel), buffers registered via
register_bufferwere only being loaded on device 0, or not loaded at all in inference scenarios.Root Causes
load_buffer()function lacked device awareness (nomp_groupparameter)load_model_with_checkpoint()completely ignored buffersSolution
load_buffer()to acceptmp_groupparameter and handle device migrationload_buffer()call sites to passmp_groupparameterload_model_with_checkpoint()Files Changed
deepspeed/module_inject/auto_tp.pydeepspeed/module_inject/replace_module.pydeepspeed/inference/engine.py