Speeding up inference for Wan2.2-VACE-fun-A14B, sage-attention and batch inference?

I have been running multi-gpu inference for Wan-2.2-VACE for resolution of 512x512 videos with 121 frames.
It takes about 50 seconds to do 20 diffusion steps on 8xH200s, using this config:
```GPU_memory_mode     = "model_full_load"
ulysses_degree      = 4
ring_degree         = 2
fsdp_dit            = False
fsdp_text_encoder   = True
compile_dit         = True

enable_teacache     = True
teacache_threshold  = 0.10
num_skip_start_steps = 5
teacache_offload    = False

cfg_skip_ratio      = 0.1

# Riflex config
enable_riflex       = False
# Index of intrinsic frequency
riflex_k            = 6

# Config and model path
config_path         = "config/wan2.2/wan_civitai_t2v.yaml"
# model path
model_name          = "models/Diffusion_Transformer/Wan2.2-VACE-Fun-A14B"

# Choose the sampler in "Flow", "Flow_Unipc", "Flow_DPM++"
sampler_name        = "Flow"
# [NOTE]: Noise schedule shift parameter. Affects temporal dynamics. 
# Used when the sampler is in "Flow_Unipc", "Flow_DPM++".
shift               = 12.0

# Other params
sample_size         = [512, 512]
video_length        = 121
fps                 = 60

weight_dtype        = torch.bfloat16
control_video       = "outputs-1/input_videos/1.mp4"
start_image         = None
end_image           = None
# Use inpaint video instead of start image and end image.
inpaint_video       = None
inpaint_video_mask  = None
subject_ref_images  = None
vace_context_scale  = 1.00
padding_in_subject_ref_images   = True

guidance_scale          = 5.0
seed                    = 43
num_inference_steps     = 20
```

It is possible to use sage-attention with the model?
I tried flash attention but it did not offer any speedups compared to torch SDPA,
I installed sage attention but the model does not seem to be using it.

Is it possible to do batch inference (i.e. generating two videos using 2 differenct control videos on the same GPU since, max memory util is 70GB on H200?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speeding up inference for Wan2.2-VACE-fun-A14B, sage-attention and batch inference? #414

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speeding up inference for Wan2.2-VACE-fun-A14B, sage-attention and batch inference? #414

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions