-
Notifications
You must be signed in to change notification settings - Fork 29
Support for max_window_layers #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks super clean, thank you @bigximik!
@bigximik Why did you drop support for non-flash windowed attention? It should be supported. |
window_size = self._config.window_size | ||
if ( | ||
self._config.max_window_layers is not None | ||
and self._layer_index < self._config.max_window_layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect because layer index starts at 1 for some reason.
Hi @jlamypoirier, I strongly prefer that we don't merge unfinished, experimental features just to "play with them", especially when they're complex and introduce long-term maintenance overhead. Let me reiterate that:
Please let's keep focus on the roadmap and merge features when they're fully ready and actually needed. Thanks. |
β¨ Description
Closes #147
Also, added assert for
window_size
usage for non flash attentionπ Type of change
Select all that apply:
π Changes
List the key changes introduced in this PR:
max_window_layers
a threshold on which layers to use sliding window attentionwindow_size
usage for non flash attentionβ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Testing