[Inference] Add AdaSpa support for accelerating video generation#1158
[Inference] Add AdaSpa support for accelerating video generation#1158lemonadenn wants to merge 2 commits intoflagos-ai:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Adds AdaSpa-based sparse attention to accelerate Wan2.1 video diffusion inference, along with example configs and setup docs, and vendors the required block-sparse attention/CUTLASS third-party sources.
Changes:
- Introduce AdaSpa attention processor + handler to swap Diffusers attention processors to AdaSpa at inference time.
- Add/adjust Wan2.1 inference configs (including AdaSpa + combined transformations) and provide setup/usage documentation.
- Vendor block-sparse attention (CUDA/C++/Python) and CUTLASS third-party sources needed for compilation and benchmarking/tests.
Reviewed changes
Copilot reviewed 155 out of 5745 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm60_8h__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm60_8h__dep__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm50_8h__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm50_8h__dep__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h__dep__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h.html | Vendored CUTLASS generated docs (HTML). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h__dep__incl.md5 | Vendored CUTLASS generated docs artifact (md5). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h.html | Vendored CUTLASS generated docs (HTML). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/_config.yml | Vendored CUTLASS docs site config. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cuDNN.cmake | Vendored CUTLASS cuDNN discovery CMake. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cuBLAS.cmake | Vendored CUTLASS cuBLAS discovery CMake. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/version.h.in | Vendored CUTLASS version header template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/nop.cu | Vendored CUTLASS build smoke-test CU file. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/googletest.cmake | Vendored CUTLASS test dependency fetch script. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/NvidiaCutlassPackageConfig.cmake | Vendored CUTLASS packaging config. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/NvidiaCutlassConfig.cmake | Vendored CUTLASS config module. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/CTestTestfile.test.configure.cmake | Vendored CUTLASS CTest config template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/CTestTestfile.configure.cmake | Vendored CUTLASS CTest config template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/bin2hex.cmake | Vendored CUTLASS helper CMake script. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/PUBLICATIONS.md | Vendored CUTLASS documentation. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/LICENSE.txt | Vendored CUTLASS license. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/CONTRIBUTORS.md | Vendored CUTLASS contributors list. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/CITATION.cff | Vendored CUTLASS citation metadata. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.gitignore | Vendored CUTLASS ignore file. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/stale.yml | Vendored CUTLASS workflow file (inert under non-root). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/new-issues-to-triage-projects.yml | Vendored CUTLASS workflow file (inert under non-root). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/labeler.yml | Vendored CUTLASS workflow file (inert under non-root). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/submit_question.md | Vendored CUTLASS issue template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/feature_request.md | Vendored CUTLASS issue template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/documentation_request.md | Vendored CUTLASS issue template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/config.yml | Vendored CUTLASS issue template config. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/bug_report.md | Vendored CUTLASS issue template. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/static_switch.h | Adds compile-time dispatch macros for head-dim/type. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/philox.cuh | Adds Philox RNG utilities used by CUDA kernels. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/kernel_traits_sm90.h | Adds SM90 kernel trait definitions (FlashAttention-derived). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/generate_kernels.py | Adds generator for kernel instantiation translation units. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim96_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim32_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim32_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim256_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim256_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim224_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim224_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim192_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim192_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim160_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim128_bf16_sm80.cu | Adds generated fwd split-kv instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim96_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim96_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim64_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim64_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim32_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim32_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim256_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim256_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim224_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim224_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim192_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim192_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim160_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim160_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim128_fp16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim128_bf16_sm80.cu | Adds generated fwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim64_fp16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim64_bf16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim32_fp16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim32_bf16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim128_fp16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim128_bf16_sm80.cu | Adds generated fwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim96_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim96_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim64_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim64_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim32_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim32_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim256_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim256_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim224_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim224_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim192_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim192_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim160_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim160_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim128_fp16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim128_bf16_sm80.cu | Adds generated bwd instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim64_fp16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim64_bf16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim32_fp16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim32_bf16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim128_fp16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim128_bf16_sm80.cu | Adds generated bwd block-streaming instantiation TU. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash.h | Core params + API declarations for flash/block-sparse kernels. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/block_info.h | Adds block info utilities for varlen kernels. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/alibi.h | Adds ALiBi bias application helper. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/utils.py | Benchmark/perf utilities for fwd+bwd testing. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/blocksparse.py | Block-sparse perf script for fwd+bwd. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/block_streaming.py | Block-streaming perf script for fwd+bwd. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_correctness/full_test.py | Correctness tests for fwd+bwd block sparse. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/utils.py | Benchmark/perf utilities for fwd-only testing. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/token_streaming.py | Token-streaming perf script for fwd-only. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/blocksparse.py | Block-sparse perf script for fwd-only. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_correctness/full_test.py | Correctness tests for fwd-only block sparse. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/utils/benchmark.py | Benchmark utilities used by perf scripts/tests. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/pyproject.toml | Formatting config for vendored Python package. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/bert_padding.py | Input unpadding/padding helpers (FlashAttention derived). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/init.py | Exposes Python APIs for block sparse attention package. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/README.md | Third-party package README for block sparse attention. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/Makefile | Packaging helper make targets (third-party). |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/MANIFEST.in | Packaging include rules for third-party sources. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/LICENSE | Third-party license for block sparse attention. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.gitmodules | Third-party submodule metadata for CUTLASS. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.gitignore | Third-party ignore file. |
| flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.github/workflows/publish.yml | Third-party publishing workflow (inert under non-root). |
| flagscale/inference/core/diffusion/adaspa/processor.py | Adds Wan AdaSpa attention processor wrapper with SDPA patching. |
| flagscale/inference/core/diffusion/adaspa/adasparse_args.py | Adds runtime args plumbing for AdaSpa sparse attention. |
| flagscale/inference/core/diffusion/adaspa/adasparse/init.py | Exposes AdaSpa sparse attention API. |
| flagscale/inference/core/diffusion/adaspa/adaspa_handler.py | Adds generic handler to swap attention processors to AdaSpa versions. |
| flagscale/inference/core/diffusion/adaspa/init.py | Exposes AdaSpaHandler + registration for Wan processors. |
| examples/wan2_1/conf/inference/1.3b_combine.yaml | Adds combined AdaSpa + TaylorSeer example config. |
| examples/wan2_1/conf/inference/1.3b_adaspa.yaml | Adds AdaSpa-only example config for Wan2.1. |
| examples/wan2_1/conf/inference/1.3b.yaml | Updates base model path for Wan2.1 config. |
| examples/wan2_1/conf/inference.yaml | Adjusts runner setup command default. |
| examples/wan2_1/conf/README.md | Adds setup and usage instructions for Wan2.1 + AdaSpa. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._original_processors[key] = original_processor | ||
|
|
||
| # Create sparse processor wrapper around original processor | ||
| sparse_processor = sparse_type() |
There was a problem hiding this comment.
WanAdaSpaAttnProcessor requires a base_processor instance (it raises at call time when base_processor is None), but the handler constructs it with no arguments. Construct the sparse processor with the original processor wired in (e.g., sparse_type(base_processor=original_processor)), and consider forwarding any handler kwargs through as well.
| sparse_processor = sparse_type() | |
| sparse_processor = sparse_type(base_processor=original_processor, **self.kwargs) |
| # Store original processor | ||
| key = f"{module.__class__.__name__}.processor" |
There was a problem hiding this comment.
The key used to track original/sparse processors is not unique per Attention module (multiple Attention instances share the same class name), so later entries will overwrite earlier ones. Use a unique identifier (e.g., module path name from model.named_modules(), or id(module)) to avoid collisions and make restoration/debugging reliable.
| # Store original processor | |
| key = f"{module.__class__.__name__}.processor" | |
| # Store original processor using a unique key per module instance | |
| key = f"{module.__class__.__name__}-{id(module)}.processor" |
| if self.base_processor is None: | ||
| raise RuntimeError("WanAdaSpaAttnProcessor requires a base_processor instance.") | ||
|
|
||
| original_sdpa = F.scaled_dot_product_attention |
There was a problem hiding this comment.
This monkey-patches torch.nn.functional.scaled_dot_product_attention at module scope for the duration of the call. Even with restoration in finally, concurrent inference (threads/async tasks) can observe the patched function and get unintended behavior. Prefer avoiding global patching (e.g., refactor to call AdaSpa explicitly at a single call site), or guard the patch with a process-wide lock to serialize SDPA patch sections.
| # (some diffusers versions import `scaled_dot_product_attention` directly) | ||
| patched_items: list[tuple[str, Any, Any]] = [] | ||
|
|
||
| F.scaled_dot_product_attention = _patched_sdpa |
There was a problem hiding this comment.
This monkey-patches torch.nn.functional.scaled_dot_product_attention at module scope for the duration of the call. Even with restoration in finally, concurrent inference (threads/async tasks) can observe the patched function and get unintended behavior. Prefer avoiding global patching (e.g., refactor to call AdaSpa explicitly at a single call site), or guard the patch with a process-wide lock to serialize SDPA patch sections.
| __device__ inline Philox(unsigned long long seed, | ||
| unsigned long long subsequence, | ||
| unsigned long long offset) | ||
| : STATE(0) | ||
| , seed_(seed) | ||
| , offset_(offset) | ||
| , key(reinterpret_cast<const uint2&>(seed)) { |
There was a problem hiding this comment.
The Philox RNG state ignores the constructor's subsequence argument and uses offset_ for both subsequence and offset when generating values. This will change the RNG stream and can break dropout reproducibility/correctness. Store subsequence as a member and call flash::philox(seed_, subsequence_, offset_) (and ensure the counter/offset units match the intended 128-bit increment scheme).
| #define FWD_HEADDIM_SWITCH(HEADDIM, ...) \ | ||
| [&] { \ | ||
| if (HEADDIM <= 32) { \ | ||
| constexpr static int kHeadDim = 32; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 64) { \ | ||
| constexpr static int kHeadDim = 64; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 96) { \ | ||
| constexpr static int kHeadDim = 96; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 128) { \ | ||
| constexpr static int kHeadDim = 128; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 160) { \ | ||
| constexpr static int kHeadDim = 160; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 192) { \ | ||
| constexpr static int kHeadDim = 192; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 224) { \ | ||
| constexpr static int kHeadDim = 224; \ | ||
| return __VA_ARGS__(); \ | ||
| } else if (HEADDIM <= 256) { \ | ||
| constexpr static int kHeadDim = 256; \ | ||
| return __VA_ARGS__(); \ | ||
| } \ | ||
| }() |
There was a problem hiding this comment.
These switch macros have no fallback branch when HEADDIM exceeds the last threshold, meaning the lambda may not return a value on all control paths (compile error/UB depending on usage). Add an else that triggers a compile-time failure (e.g., static_assert) or a runtime error path so misuse produces a clear diagnostic.
| @@ -1,5 +1,5 @@ | |||
| engine: | |||
| model: Wan-AI/Wan2.1-T2V-1.3B-Diffusers | |||
| model: /workspace/models/Wan2.1-T2V-1.3B-Diffusers | |||
There was a problem hiding this comment.
This changes the model identifier from a portable Hub ID to a machine-specific absolute path. For a repo example config, prefer keeping the Hub ID (or using an env-var/placeholder like ${oc.env:WAN_MODEL_PATH, Wan-AI/Wan2.1-T2V-1.3B-Diffusers}) so the config works out-of-the-box across environments while still allowing local overrides.
| model: /workspace/models/Wan2.1-T2V-1.3B-Diffusers | |
| model: ${oc.env:WAN_MODEL_PATH, Wan-AI/Wan2.1-T2V-1.3B-Diffusers} |
| ```bash | ||
| defaults: | ||
| - _self_ | ||
| - inference: 1.3b_adaspa (adaspa) | ||
| : 1.3b (taylorseer) | ||
|
|
There was a problem hiding this comment.
The 'Config selection' snippet is labeled as bash but shows what looks like a Hydra YAML fragment, and the - inference: 1.3b_adaspa (adaspa) / : 1.3b (taylorseer) syntax is not valid YAML. Please replace this with a valid defaults example (and mark the code block as yaml) so users can copy-paste it successfully.
| ```bash | |
| defaults: | |
| - _self_ | |
| - inference: 1.3b_adaspa (adaspa) | |
| : 1.3b (taylorseer) | |
| ```yaml | |
| defaults: | |
| - _self_ | |
| - inference: 1.3b_adaspa # use AdaSpa-accelerated config | |
| # To use the standard Wan2.1 config instead, comment the line above and uncomment the line below: | |
| # - inference: 1.3b # use non-AdaSpa config |
|
|
||
| write_to_excel(excel_label, excel_data, excel_dir_path, excel_file_name) | ||
|
|
||
| profile_blocksparse_fwd_bwd() No newline at end of file |
There was a problem hiding this comment.
This performance script executes profiling unconditionally on import, which can cause unexpected long-running GPU work during test discovery or when the module is imported. Wrap the call in if __name__ == \"__main__\": so it only runs when invoked as a script.
| profile_blocksparse_fwd_bwd() | |
| if __name__ == "__main__": | |
| profile_blocksparse_fwd_bwd() |
| @@ -0,0 +1,156 @@ | |||
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | |||
There was a problem hiding this comment.
A large set of generated CUTLASS documentation artifacts (HTML + *.md5) is being committed under csrc/cutlass/docs/. These are typically build/generated outputs and significantly bloat the repo/PR diff without affecting AdaSpa runtime. Consider removing generated docs from the vendored tree (or excluding docs via vendoring rules) and keeping only the source needed to build the kernels.
PR Category
Inference
PR Types
Improvements
PR Description
####This PR adds AdaSpa support for Wan2.1 inference acceleration and includes usage documentation.
examples/wan2_1/conf/inference/1.3b_adaspa.yamlexamples/wan2_1/conf/inference/1.3b.yamlexamples/wan2_1/conf/inference/1.3b_combine.yamlexamples/wan2_1/conf/inference.yamlexamples/wan2_1/conf/README.mdValidation