Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about Fig.5 and AST #13

Open
HayuZH opened this issue Jan 1, 2025 · 1 comment
Open

Some questions about Fig.5 and AST #13

HayuZH opened this issue Jan 1, 2025 · 1 comment

Comments

@HayuZH
Copy link

HayuZH commented Jan 1, 2025

Many thanks to the author for his outstanding work.
I am very inspired by your work.
I ran into some problems while reading your paper and code on Attention Sharing across Timesteps (AST). Taking DiT-XL-2 in Fig.5 as an example, I noticed that most of the layers in t0-t9 use the AST technique. Taking layer20 of t9 as an example, if we use the cached results of t8 or earlier t0 at this moment, the computation results from layer20 of t0 to layer20 of t9 should be useless, just like Fig.3 in deepcache, because the attention map of layer20 of t9 is the same as that of layer20 of t0, which is the same as that of layer20 of t9, which is the same as that of layer20 of t0. The attention map used for layer20 of t9 is exactly the same as that used for layer20 of t0. I would like to know how the compression method obtained in Fig.5 knows from which timestep cache the result is cached? Is it the most recent one?
I would be very glad to get your answer!

Translated with DeepL.com (free version)

@HayuZH HayuZH closed this as completed Jan 1, 2025
@HayuZH HayuZH reopened this Jan 1, 2025
@Probe100
Copy link
Collaborator

Probe100 commented Jan 7, 2025

@HayuZH Your understanding is correct. One thing to note is that for AST we cache the output of attention instead of attention map. If AST is used for layer 20 from t0-t9, the attention output of t0 is cached and reused in t1-t9 (attention computation from t1-t9 is skipped).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants