Skip to content

[Inference] Add AdaSpa support for accelerating video generation#1158

Open
lemonadenn wants to merge 2 commits intoflagos-ai:mainfrom
lemonadenn:feat/wan2-1-adaspa
Open

[Inference] Add AdaSpa support for accelerating video generation#1158
lemonadenn wants to merge 2 commits intoflagos-ai:mainfrom
lemonadenn:feat/wan2-1-adaspa

Conversation

@lemonadenn
Copy link
Copy Markdown

PR Category

Inference

PR Types

Improvements

PR Description

####This PR adds AdaSpa support for Wan2.1 inference acceleration and includes usage documentation.

  • Integrate AdaSpa into Wan2.1 diffusion inference flow.
  • Add/update Wan2.1 inference configs:
    • examples/wan2_1/conf/inference/1.3b_adaspa.yaml
    • examples/wan2_1/conf/inference/1.3b.yaml
    • examples/wan2_1/conf/inference/1.3b_combine.yaml
    • examples/wan2_1/conf/inference.yaml
  • Add documentation for setup and usage:
    • examples/wan2_1/conf/README.md

Validation

  • Verified the AdaSpa-related environment setup on local machine.
  • Ran Wan2.1 inference with AdaSpa config successfully.
  • Confirmed inference can complete and generate output files.

@lemonadenn lemonadenn requested a review from zhaoyinglia as a code owner March 20, 2026 10:04
Copilot AI review requested due to automatic review settings March 20, 2026 10:04
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds AdaSpa-based sparse attention to accelerate Wan2.1 video diffusion inference, along with example configs and setup docs, and vendors the required block-sparse attention/CUTLASS third-party sources.

Changes:

  • Introduce AdaSpa attention processor + handler to swap Diffusers attention processors to AdaSpa at inference time.
  • Add/adjust Wan2.1 inference configs (including AdaSpa + combined transformations) and provide setup/usage documentation.
  • Vendor block-sparse attention (CUDA/C++/Python) and CUTLASS third-party sources needed for compilation and benchmarking/tests.

Reviewed changes

Copilot reviewed 155 out of 5745 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm60_8h__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm60_8h__dep__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm50_8h__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma__sm50_8h__dep__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h__dep__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/arch_2mma_8h.html Vendored CUTLASS generated docs (HTML).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h__dep__incl.md5 Vendored CUTLASS generated docs artifact (md5).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/aligned__buffer_8h.html Vendored CUTLASS generated docs (HTML).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/docs/_config.yml Vendored CUTLASS docs site config.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cuDNN.cmake Vendored CUTLASS cuDNN discovery CMake.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cuBLAS.cmake Vendored CUTLASS cuBLAS discovery CMake.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/version.h.in Vendored CUTLASS version header template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/nop.cu Vendored CUTLASS build smoke-test CU file.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/googletest.cmake Vendored CUTLASS test dependency fetch script.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/NvidiaCutlassPackageConfig.cmake Vendored CUTLASS packaging config.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/NvidiaCutlassConfig.cmake Vendored CUTLASS config module.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/CTestTestfile.test.configure.cmake Vendored CUTLASS CTest config template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/cmake/CTestTestfile.configure.cmake Vendored CUTLASS CTest config template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/bin2hex.cmake Vendored CUTLASS helper CMake script.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/PUBLICATIONS.md Vendored CUTLASS documentation.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/LICENSE.txt Vendored CUTLASS license.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/CONTRIBUTORS.md Vendored CUTLASS contributors list.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/CITATION.cff Vendored CUTLASS citation metadata.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.gitignore Vendored CUTLASS ignore file.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/stale.yml Vendored CUTLASS workflow file (inert under non-root).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/new-issues-to-triage-projects.yml Vendored CUTLASS workflow file (inert under non-root).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/workflows/labeler.yml Vendored CUTLASS workflow file (inert under non-root).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/submit_question.md Vendored CUTLASS issue template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/feature_request.md Vendored CUTLASS issue template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/documentation_request.md Vendored CUTLASS issue template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/config.yml Vendored CUTLASS issue template config.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/cutlass/.github/ISSUE_TEMPLATE/bug_report.md Vendored CUTLASS issue template.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/static_switch.h Adds compile-time dispatch macros for head-dim/type.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/philox.cuh Adds Philox RNG utilities used by CUDA kernels.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/kernel_traits_sm90.h Adds SM90 kernel trait definitions (FlashAttention-derived).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/generate_kernels.py Adds generator for kernel instantiation translation units.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim96_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim32_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim32_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim256_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim256_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim224_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim224_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim192_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim192_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim160_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim128_bf16_sm80.cu Adds generated fwd split-kv instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim96_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim96_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim64_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim64_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim32_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim32_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim256_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim256_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim224_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim224_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim192_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim192_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim160_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim160_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim128_fp16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_hdim128_bf16_sm80.cu Adds generated fwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim64_fp16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim64_bf16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim32_fp16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim32_bf16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim128_fp16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_fwd_block_hdim128_bf16_sm80.cu Adds generated fwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim96_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim96_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim64_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim64_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim32_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim32_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim256_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim256_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim224_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim224_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim192_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim192_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim160_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim160_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim128_fp16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_hdim128_bf16_sm80.cu Adds generated bwd instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim64_fp16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim64_bf16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim32_fp16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim32_bf16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim128_fp16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash_bwd_block_hdim128_bf16_sm80.cu Adds generated bwd block-streaming instantiation TU.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/flash.h Core params + API declarations for flash/block-sparse kernels.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/block_info.h Adds block info utilities for varlen kernels.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/csrc/block_sparse_attn/src/alibi.h Adds ALiBi bias application helper.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/utils.py Benchmark/perf utilities for fwd+bwd testing.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/blocksparse.py Block-sparse perf script for fwd+bwd.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_performance/block_streaming.py Block-streaming perf script for fwd+bwd.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd_bwd/test_correctness/full_test.py Correctness tests for fwd+bwd block sparse.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/utils.py Benchmark/perf utilities for fwd-only testing.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/token_streaming.py Token-streaming perf script for fwd-only.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_performance/blocksparse.py Block-sparse perf script for fwd-only.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_tests/fwd/test_correctness/full_test.py Correctness tests for fwd-only block sparse.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/utils/benchmark.py Benchmark utilities used by perf scripts/tests.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/pyproject.toml Formatting config for vendored Python package.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/bert_padding.py Input unpadding/padding helpers (FlashAttention derived).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/block_sparse_attn/init.py Exposes Python APIs for block sparse attention package.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/README.md Third-party package README for block sparse attention.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/Makefile Packaging helper make targets (third-party).
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/MANIFEST.in Packaging include rules for third-party sources.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/LICENSE Third-party license for block sparse attention.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.gitmodules Third-party submodule metadata for CUTLASS.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.gitignore Third-party ignore file.
flagscale/inference/core/diffusion/adaspa/third_party/block_sparse_attention/.github/workflows/publish.yml Third-party publishing workflow (inert under non-root).
flagscale/inference/core/diffusion/adaspa/processor.py Adds Wan AdaSpa attention processor wrapper with SDPA patching.
flagscale/inference/core/diffusion/adaspa/adasparse_args.py Adds runtime args plumbing for AdaSpa sparse attention.
flagscale/inference/core/diffusion/adaspa/adasparse/init.py Exposes AdaSpa sparse attention API.
flagscale/inference/core/diffusion/adaspa/adaspa_handler.py Adds generic handler to swap attention processors to AdaSpa versions.
flagscale/inference/core/diffusion/adaspa/init.py Exposes AdaSpaHandler + registration for Wan processors.
examples/wan2_1/conf/inference/1.3b_combine.yaml Adds combined AdaSpa + TaylorSeer example config.
examples/wan2_1/conf/inference/1.3b_adaspa.yaml Adds AdaSpa-only example config for Wan2.1.
examples/wan2_1/conf/inference/1.3b.yaml Updates base model path for Wan2.1 config.
examples/wan2_1/conf/inference.yaml Adjusts runner setup command default.
examples/wan2_1/conf/README.md Adds setup and usage instructions for Wan2.1 + AdaSpa.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self._original_processors[key] = original_processor

# Create sparse processor wrapper around original processor
sparse_processor = sparse_type()
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WanAdaSpaAttnProcessor requires a base_processor instance (it raises at call time when base_processor is None), but the handler constructs it with no arguments. Construct the sparse processor with the original processor wired in (e.g., sparse_type(base_processor=original_processor)), and consider forwarding any handler kwargs through as well.

Suggested change
sparse_processor = sparse_type()
sparse_processor = sparse_type(base_processor=original_processor, **self.kwargs)

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +75
# Store original processor
key = f"{module.__class__.__name__}.processor"
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key used to track original/sparse processors is not unique per Attention module (multiple Attention instances share the same class name), so later entries will overwrite earlier ones. Use a unique identifier (e.g., module path name from model.named_modules(), or id(module)) to avoid collisions and make restoration/debugging reliable.

Suggested change
# Store original processor
key = f"{module.__class__.__name__}.processor"
# Store original processor using a unique key per module instance
key = f"{module.__class__.__name__}-{id(module)}.processor"

Copilot uses AI. Check for mistakes.
if self.base_processor is None:
raise RuntimeError("WanAdaSpaAttnProcessor requires a base_processor instance.")

original_sdpa = F.scaled_dot_product_attention
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This monkey-patches torch.nn.functional.scaled_dot_product_attention at module scope for the duration of the call. Even with restoration in finally, concurrent inference (threads/async tasks) can observe the patched function and get unintended behavior. Prefer avoiding global patching (e.g., refactor to call AdaSpa explicitly at a single call site), or guard the patch with a process-wide lock to serialize SDPA patch sections.

Copilot uses AI. Check for mistakes.
# (some diffusers versions import `scaled_dot_product_attention` directly)
patched_items: list[tuple[str, Any, Any]] = []

F.scaled_dot_product_attention = _patched_sdpa
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This monkey-patches torch.nn.functional.scaled_dot_product_attention at module scope for the duration of the call. Even with restoration in finally, concurrent inference (threads/async tasks) can observe the patched function and get unintended behavior. Prefer avoiding global patching (e.g., refactor to call AdaSpa explicitly at a single call site), or guard the patch with a process-wide lock to serialize SDPA patch sections.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +63
__device__ inline Philox(unsigned long long seed,
unsigned long long subsequence,
unsigned long long offset)
: STATE(0)
, seed_(seed)
, offset_(offset)
, key(reinterpret_cast<const uint2&>(seed)) {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Philox RNG state ignores the constructor's subsequence argument and uses offset_ for both subsequence and offset when generating values. This will change the RNG stream and can break dropout reproducibility/correctness. Store subsequence as a member and call flash::philox(seed_, subsequence_, offset_) (and ensure the counter/offset units match the intended 128-bit increment scheme).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +66
#define FWD_HEADDIM_SWITCH(HEADDIM, ...) \
[&] { \
if (HEADDIM <= 32) { \
constexpr static int kHeadDim = 32; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 64) { \
constexpr static int kHeadDim = 64; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 96) { \
constexpr static int kHeadDim = 96; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 128) { \
constexpr static int kHeadDim = 128; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 160) { \
constexpr static int kHeadDim = 160; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 192) { \
constexpr static int kHeadDim = 192; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 224) { \
constexpr static int kHeadDim = 224; \
return __VA_ARGS__(); \
} else if (HEADDIM <= 256) { \
constexpr static int kHeadDim = 256; \
return __VA_ARGS__(); \
} \
}()
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These switch macros have no fallback branch when HEADDIM exceeds the last threshold, meaning the lambda may not return a value on all control paths (compile error/UB depending on usage). Add an else that triggers a compile-time failure (e.g., static_assert) or a runtime error path so misuse produces a clear diagnostic.

Copilot uses AI. Check for mistakes.
@@ -1,5 +1,5 @@
engine:
model: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
model: /workspace/models/Wan2.1-T2V-1.3B-Diffusers
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the model identifier from a portable Hub ID to a machine-specific absolute path. For a repo example config, prefer keeping the Hub ID (or using an env-var/placeholder like ${oc.env:WAN_MODEL_PATH, Wan-AI/Wan2.1-T2V-1.3B-Diffusers}) so the config works out-of-the-box across environments while still allowing local overrides.

Suggested change
model: /workspace/models/Wan2.1-T2V-1.3B-Diffusers
model: ${oc.env:WAN_MODEL_PATH, Wan-AI/Wan2.1-T2V-1.3B-Diffusers}

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +35
```bash
defaults:
- _self_
- inference: 1.3b_adaspa (adaspa)
: 1.3b (taylorseer)

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'Config selection' snippet is labeled as bash but shows what looks like a Hydra YAML fragment, and the - inference: 1.3b_adaspa (adaspa) / : 1.3b (taylorseer) syntax is not valid YAML. Please replace this with a valid defaults example (and mark the code block as yaml) so users can copy-paste it successfully.

Suggested change
```bash
defaults:
- _self_
- inference: 1.3b_adaspa (adaspa)
: 1.3b (taylorseer)
```yaml
defaults:
- _self_
- inference: 1.3b_adaspa # use AdaSpa-accelerated config
# To use the standard Wan2.1 config instead, comment the line above and uncomment the line below:
# - inference: 1.3b # use non-AdaSpa config

Copilot uses AI. Check for mistakes.

write_to_excel(excel_label, excel_data, excel_dir_path, excel_file_name)

profile_blocksparse_fwd_bwd() No newline at end of file
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This performance script executes profiling unconditionally on import, which can cause unexpected long-running GPU work during test discovery or when the module is imported. Wrap the call in if __name__ == \"__main__\": so it only runs when invoked as a script.

Suggested change
profile_blocksparse_fwd_bwd()
if __name__ == "__main__":
profile_blocksparse_fwd_bwd()

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,156 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large set of generated CUTLASS documentation artifacts (HTML + *.md5) is being committed under csrc/cutlass/docs/. These are typically build/generated outputs and significantly bloat the repo/PR diff without affecting AdaSpa runtime. Consider removing generated docs from the vendored tree (or excluding docs via vendoring rules) and keeping only the source needed to build the kernels.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants