-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT: Memory window #10332
base: master
Are you sure you want to change the base?
UCT: Memory window #10332
Conversation
src/uct/api/uct.h
Outdated
* original memh cannot be destroyed until all its memory windows are | ||
* destroyed. | ||
*/ | ||
UCT_MD_MEM_WINDOW = UCS_BIT(12), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call it
UCT_MD_MEM_WINDOW = UCS_BIT(12), | |
UCT_MD_MEM_DERIVED = UCS_BIT(12), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uct/api/uct.h
Outdated
* Register a memory window: a shallow copy of some existing UCT memory | ||
* handle, which can be used to access the same memory region. When created, | ||
* the memory window inherits the original memh access flags and state, and | ||
* takes ownership of the indirect keys of the original memory handle. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indirect key is IB specific we should not mention it in UCT API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/uct/api/v2/uct_v2.h
Outdated
|
||
/** | ||
* Represents a pointer to the existing memory handle. | ||
* Used to create a memory window is (with flag @ref UCT_MD_MEM_WINDOW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Used to create a memory window is (with flag @ref UCT_MD_MEM_WINDOW) | |
* Used to create a derived memory handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
Outdated
ucs_assertv(!(base->super.flags & UCT_IB_MEM_FLAG_MEM_WINDOW), | ||
"memh=%p is already a memory window", base); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should mention this limitation in the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
Outdated
base->atomic_dvmr = NULL; | ||
base->atomic_rkey = UCT_IB_INVALID_MKEY; | ||
base->indirect_dvmr = NULL; | ||
base->indirect_rkey = UCT_IB_INVALID_MKEY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to invalidate basic memh keys? I'd expect that this will break (at least performance) of some (at least atomics) operations on the base key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not hinder performance, because base memh rkeys will be recreated on next mkey_pack, like is done in my test.
But you are right, strictly speaking it's NOT necessary to transfer rkeys ownership to the derived memh. It will work even without this transfer. I will remove this ownership transfer logic
The reason why I added this transfer is the following:
Currently it's UCT layer that decides whether to create rkeys (atomic + indirect), and conditions for them are different:
- atomic rkey is created if user requested it (with
UCT_MD_MEM_ACCESS_REMOTE_ATOMIC
), and hardware supports it. So it might be created no matter if invalidation is needed - indirect rkey is created only when invalidation is needed
Now from UCP layer we assume that we will create a memory window when invalidation if needed from EP config. By this time atomic rkeys on base memh might be already created and I thought it's a good idea to move it to the memory window, to avoid re-creating this key per MW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed ownership transfer for now, let's consider this later
test/gtest/uct/ib/test_ib_md.cc
Outdated
|
||
/* Test case 4: base memh can still be used to pack mkeys */ | ||
std::vector<uint8_t> base_rkey2 = mkey_pack(base, flags); | ||
EXPECT_NE(base_rkey1, base_rkey2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not think it is correct. base memh should not be modified once derived memh is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed this
test/gtest/uct/ib/test_ib_md.cc
Outdated
uct_mem_h mw1 = reg_mem_window(base); | ||
|
||
/* Test case 2: creating MW from memh after mkey_pack */ | ||
std::vector<uint8_t> base_rkey1 = mkey_pack(base, flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need invalidation on first mkey pack of base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need test with packing without invalidation first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of UCT invalidation would mean that we destroy derived memh. Currently I don't have tests with invalidation, actually it's a good point to add them.
Or I misunderstood your comment
What?
This PR is a composite part of the new memory invalidation design based on the concept of memory windows.
Here we introduce changes on the UCT side.
On UCT side memory window represents a shallow copy of some existing UCT memory handle, which can be used to access the same memory region. When created, the memory window inherits the original memh access flags and state, and takes ownership of the indirect keys of the original memory handle. The lifetime of the memory window is bound to the original memh, and the original memh cannot be destroyed until all its memory windows are destroyed.
Design doc
Why?
The overall idea is to employ a lightweight invalidation: when failure happens we invalidate only the exposed indirect rkeys, but don't deregister the entire memory region, which is an expensive operation. Lightweight invalidation enables invalidation workflow for RNDV pipeline protocol.
How?
uct_md_mem_reg_params_t
:UCT_MD_MEM_WINDOW
. Existingmem_reg
call will create a memory window based on existing UCT memh