Skip to content

Expose CFG-parallel Transfer2.5 inference#233

Open
Glitchfix wants to merge 1 commit into
nvidia-cosmos:mainfrom
Glitchfix:feat/195-cfg-parallel-inference
Open

Expose CFG-parallel Transfer2.5 inference#233
Glitchfix wants to merge 1 commit into
nvidia-cosmos:mainfrom
Glitchfix:feat/195-cfg-parallel-inference

Conversation

@Glitchfix

Copy link
Copy Markdown

Summary

  • Exposes the existing CFG-parallel Transfer2.5 inference path through the base inference setup arguments.
  • Validates that CFG parallelism uses an even context-parallel world matching WORLD_SIZE before model loading.
  • Documents the benchmark command and adds focused config tests for the base-only CLI surface.

Addresses #195.

Validation

  • PYENV_VERSION=3.12.9 PYTHONPATH=packages/cosmos-cuda:packages/cosmos-oss python -m pytest -q --noconftest -o addopts="" tests/config_test.py
  • PYENV_VERSION=3.12.9 python -m ruff format --check cosmos_transfer2/config.py cosmos_transfer2/inference.py cosmos_transfer2/_src/transfer2/inference/inference_pipeline.py tests/config_test.py
  • PYENV_VERSION=3.12.9 python -m ruff check cosmos_transfer2/config.py cosmos_transfer2/inference.py cosmos_transfer2/_src/transfer2/inference/inference_pipeline.py tests/config_test.py
  • PYENV_VERSION=3.12.9 python -m py_compile cosmos_transfer2/config.py cosmos_transfer2/inference.py cosmos_transfer2/_src/transfer2/inference/inference_pipeline.py tests/config_test.py
  • git diff --check
  • uvx pre-commit run --files cosmos_transfer2/config.py cosmos_transfer2/inference.py cosmos_transfer2/_src/transfer2/inference/inference_pipeline.py docs/inference.md tests/config_test.py

Notes

  • Full multi-GPU inference benchmarking was not run locally because the available environment is missing the CUDA runtime library required by transformer_engine (libcudnn_graph.so.9).

- Added a base-model setup flag for the existing CFG-parallel path so conditional and unconditional branches can run through context-parallel inference.

- Validated the required full-world even context-parallel topology before model loading to fail fast instead of hanging distributed sends.

- Documented the benchmark command and guarded the config surface so unrelated multiview setup classes stay unchanged.

Signed-off-by: Shivanjan Chakravorty <shivanjanc@nvidia.com>
@Glitchfix Glitchfix force-pushed the feat/195-cfg-parallel-inference branch from 53b7d04 to c8e146f Compare May 21, 2026 21:03
@Glitchfix Glitchfix marked this pull request as ready for review May 21, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant