-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support xccl distributed backend #3034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Starting from `torch>=2.7` XCCL distributed backend is available for XPU devices (requires torch built with `USE_XCCL=1`). This commit is verified on Intel Data Center GPU Max with Bloom: ``` text-generation-launcher --sharded true --num-shard 2 \ --model-id bigscience/bloom-560m ``` Signed-off-by: Dmitry Rogozhkin <[email protected]>
What's the benefit over the Ipex backend ? If this allows suboptimal deployments compared to the IPEX image, I think we'd rather not merge this at all (and error out instead with instructions on how to get the better image). Not having flash attention is kind of a no-go nowadays (we still maintain the old paths but only because they existed at some point, we're not adding any anymore). |
Initially, IPEX is an external plugin for pytorch which brings in few things which essentially can be grouped in 2:
At the moment we are in the process of bringing in Intel GPUs support right into stock pytorch thru the dedicated device backend called XPU distributed backend falls into the 1st group - that's one of the features which is getting upstreamed to pytorch. The plan is that IPEX "ccl" backend will be dropped going forward and IPEX will rely on "xccl" backend exposed directly by pytorch. That's the process which will take time. XCCL distributed backend will be first available in PT 2.7 and will require manual pytorch compilation with The change I propose in this PR is being done with above background in mind. It introduces "xccl" distributed support into TGI which can be tried out if someone will build TGI against stock pytorch (without IPEX). As you correctly notice such a build has limited value due to lacking flash attention support. Basically that's the reason why I don't propose to expose such configuration on a higher level at TGI (via docker and documentation covering such environment). At the same time such a build is interesting for development as it helps to identify issues earlier and builds a foundation for the future switch for IPEX environment which ultimately will reuse the code path I introduce now for stock pytorch. Alternatively, we can postpone adding "xccl" distributed support till IPEX will be ready to use it. Having "xccl" support now, however, even requiring stock pytorch will be a help to me and other developers to prepare things in advance. I hope above helps to clarify the story and make a decision. |
Very clear explanation thanks. If you're ok with that we can keep the current PR open until there's enough support within torch so that we can modify the IPEX backend directly (for instance by using the XCCL backend). In TGI our external surface is only our docker images, everything internal to the docker image (packages, compile options etc..) is considered internal and we can modify at will (without breaking the surface ofc). The caveat with merging this directly, is that sometimes user deploy a docker on a improper node and therefore causes issues in the flash loading code, and it uses the fallback. Back 2 years ago it was ok to downgrade to a non flash implementation, but today, it's increasingly better to just error out and let users fix their environment (since otherwise it's just a silent extremely slow deployment) |
Yes, sure. Let's wait for the IPEX part of the story. |
Starting from
torch>=2.7
XCCL distributed backend is available for XPU devices (requires torch built withUSE_XCCL=1
).This commit is verified on Intel Data Center GPU Max with Bloom:
This commit does not impact IPEX which currently remains using custom distributed backend.
CC: @Narsil