-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial IFRT integration #764
Conversation
f468e60
to
8509469
Compare
changes are being made to the C-API, and specifically on the way we should make interact PjRt and IFRT (due to discussions in #738, where I was adding the Julia API). see #751 and specifically this comment #751 (comment)
EDIT: just remembered we agreed with @wsmoses to start trying just with IFRT-PjRt backend for testing, but we must be careful as this PR conflicts with #751 (where i'm changing the C-API) and with #738 (where I introduce the Julia bindings which is blocked on #751). |
I am not adding any PjRt-IfRt interaction stuff (Held* functions in that PR) here (except the ifrt::Client constructed from the PjRtClient). We need to use the direct IFRT calls for constructing arrays from HloSharding which doesn't take PjRtBuffers (at least not without multiple roundtrips) |
34888d5
to
831ec8d
Compare
2efddbf
to
105eedb
Compare
23de666
to
0816264
Compare
0fb14ae
to
8080fed
Compare
6da5b9d
to
408e120
Compare
8026eb9
to
ca8f3da
Compare
CPU setup for distributed (and presumably TPU setup) now works |
c9dd714
to
59a6689
Compare
141b468
to
ad69949
Compare
c89dca8
to
dad792d
Compare
f9233b7
to
29e7e6a
Compare
5b449d1
to
91c37ab
Compare
7041f2b
to
ea30e50
Compare
refactor: rework how OpSharding works feat: generate_device_list feat: add placeholder code to simplify future sharding logic fixup fix: store results as HloSharding docs: fix duplicate docs feat: compile with logical device ids fix: use correct global device ids feat: use a global state to setup pjrt distributed runtime fix: devices are not necessarily from 0 to N-1 fix: initialize clients on first use rather than on init fix: make device selection consistent with clients feat: add OMPI cluster detection fix: correctly set kv_store refactor: Distributed setup is not PJRT specific refactor: OMPI detection doesn't need to be in an extension feat: initial low-level IFRT API fix: ifrt HloSharding refactor: split up into IFRT/PJRT feat: IFRT Client APIs feat: IFRT Device API fix: remove global_ordinals feat: add devices list abstraction feat: wrap memory and memory kinds feat: ifrt::HloSharding now working fix: use new ABI chore: run formatter fix: no finalizer feat: initial draft of IFRT.Array interface (#774) * feat: initial draft of IFRT.Array interface * feat: Base.Array to ifrt::Array * feat: buffer to host chore: run formatter fix: bad rebase feat: more proxy servers feat: add ConcreteIFRTArray feat: add ConcreteIFRTNumber refactor: rename ConcreteRNumber to ConcretePJRTNumber revert: concreteifrtarray implementation chore: run formatter feat: ifrt loaded executable feat: construct IFRT clients with distributed options refactor: remove BasicDevicesList fix: use global device ids feat: sharding annotations across nodes now working fix: Array construction from SingleShards feat: support to_host for distributed cases feat: add Gloo/MPI collectives for distributed CPU client feat: low level compile API feat: low-level IFRT compile + execute working
ebb4870
to
7fc621e
Compare
866e7db
to
83f07fe
Compare
I will split up this PR into multiple smaller PRs once this is ready
XLA.PJRT
global_ordinals