Flash Transformers modeling backend support #2913

Cyrilvallez · 2025-01-15T19:50:22Z

What does this PR do?

Supersedes #2858 for convenience (CIs) as requested (I cannot just change the origin branch from the old PR)

Narsil · 2025-01-20T09:53:01Z

server/text_generation_server/utils/logits_process.py

@@ -219,7 +218,7 @@ def filter(self, indices):
        return None


-class HeterogeneousTopPLogitsWarper(LogitsWarper):
+class HeterogeneousTopPLogitsWarper(LogitsProcessor):


Why those changes ?

It looks like these are Warpers, not Processors.... No ?

Recent Transformers version deprecated LogitsWarper in favor of LogitsProcessor. They have the same logic/interface

LogitsWarper has been completely removed, thus it needs to change here

server/text_generation_server/models/transformers_flash_causal_lm.py

Narsil · 2025-01-20T10:02:27Z

server/text_generation_server/models/transformers_flash_causal_lm.py

+        return logits
+
+
+    def forward(


To save rewriting all of this, can we maybe monkey path model.forward.

It's not great, but it still feels better than this (which is duplicating quite complex logic that we're likely to forget to update).

in the __init__: self.model.forward = monkey_patch_forward (that does the simple unsqueeze, squeeze).

Wdyt ?

The other option would be to introduce an indirection in FlashCausalLM by having an inner method that captures that model.forwardcall and we override it (I'm not a fan either to add such level of indirection, it's just as ugly as the monkey patching in my book, because it's still impossible for the reader to know that it's being modified somewhere else).

Yes I thought about it but was not sure you guys would like it 🙃 I'll update with the monkey patch

Narsil

LGTM. Thanks for this !

Cyrilvallez and others added 24 commits December 10, 2024 16:46

add transformers_flash

ade0f44

inits

da22290

switch version to make it work

b3b0747

Update Makefile-flash-att-v2

738f0b0

Update Makefile-flash-att-v2

a84ecf2

Update Makefile-flash-att-v2

372799a

Update Makefile-flash-att-v2

a0035e6

Update Makefile-flash-att-v2

e69a384

Update Makefile-flash-att-v2

3a636ed

runnable version

649cb1f

working

490ca0e

push change

f843b62

fix high dim

715b2d1

init

e93ab92

default

f4c60ca

latest transformers changes

2e2631e

revert

44b3679

simplify check

266377b

remove flag

32488c1

improve type hints + required args

ac62bd1

Update based on transformers PR

b03d7ae

small fix

b40c889

Remove Warpers for Processor

42ae6de

fix compatibility version issue

f01014d

Narsil reviewed Jan 20, 2025

View reviewed changes

server/text_generation_server/models/transformers_flash_causal_lm.py Outdated Show resolved Hide resolved

Narsil reviewed Jan 20, 2025

View reviewed changes

Cyrilvallez added 3 commits January 20, 2025 11:29

raise error if needed

2659b59

Simplify with monkey patch

a2fe842

revert + style + minor improvements

6e0f37c

update comment

52afdcc

Cyrilvallez changed the title ~~Transformers backend~~ Flash Transformers modeling backend support Jan 20, 2025

Cyrilvallez added 2 commits January 20, 2025 15:55

device check

9af3ea4

move the import to avoid device issue

6d9c011

Narsil previously approved these changes Jan 20, 2025

View reviewed changes

Update __init__.py

2ef3002

Cyrilvallez dismissed Narsil’s stale review via 2ef3002 January 20, 2025 15:37

Cyrilvallez added 2 commits January 20, 2025 18:01

check for non-native models

70ada57

oupsi

0d9ec75

Narsil approved these changes Jan 21, 2025

View reviewed changes

Narsil merged commit b980848 into main Jan 21, 2025
13 of 14 checks passed

Narsil deleted the transformers-backend branch January 21, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flash Transformers modeling backend support #2913

Flash Transformers modeling backend support #2913

Uh oh!

Cyrilvallez commented Jan 15, 2025 •

edited

Loading

Uh oh!

Narsil Jan 20, 2025

Uh oh!

Cyrilvallez Jan 20, 2025

Uh oh!

Cyrilvallez Jan 20, 2025

Uh oh!

Uh oh!

Narsil Jan 20, 2025

Uh oh!

Cyrilvallez Jan 20, 2025

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Flash Transformers modeling backend support #2913

Flash Transformers modeling backend support #2913

Uh oh!

Conversation

Cyrilvallez commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Narsil Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Narsil Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez commented Jan 15, 2025 •

edited

Loading