-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize K-Truss #4742
base: branch-24.12
Are you sure you want to change the base?
Optimize K-Truss #4742
Conversation
cpp/src/community/k_truss_impl.cuh
Outdated
auto has_edge = thrust::binary_search( | ||
thrust::seq, edgelist_first, weak_edgelist_first, edge_p_r); | ||
|
||
if (!has_edge) { // FIXME: Do binary search instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this FIXME still relevant? It looks like you're doing binary search
cpp/src/community/k_truss_impl.cuh
Outdated
|
||
|
||
template <typename vertex_t, typename edge_t> | ||
struct extract_edges { // FIXME: ******************************Remove this functor. For testing purposes only******************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be left in here?
cpp/src/community/k_truss_impl.cuh
Outdated
std::optional<rmm::device_uvector<weight_t>> edgelist_wgts{std::nullopt}; | ||
|
||
|
||
//#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the commented out lines in this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I still had a lot of debug prints which are now removed
…2_optimize-ktruss
…2_optimize-ktruss
@jnke2016 Is this PR ready for review? |
cpp/src/community/k_truss_impl.cuh
Outdated
|
||
{ | ||
// FIXME: Use global comm for debugging purposes | ||
// then replace it by minor comm once the accuracy is verified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't you verified this? Using global_comm vs minor_comm has a pretty significant scalability impact in large scale (even if you're sending to/receiving from the same set of ranks). All-to-all on a subcommunicator performs better than P2P on a subset of ranks using a global communicator.
cpp/src/community/k_truss_impl.cuh
Outdated
std::vector<size_t> h_tx_counts(d_tx_counts.size()); | ||
|
||
raft::update_host( | ||
h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle.sync_stream() is necessary before using h_tx_counts.
No guarantee that h_tx_counts will hold valid values before handle.sync_stream().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to address this.
cpp/src/community/k_truss_impl.cuh
Outdated
std::tie(dsts, std::ignore) = | ||
shuffle_values(handle.get_comms(), edgelist_dsts.begin(), h_tx_counts, handle.get_stream()); | ||
|
||
// rmm::device_uvector<bool> edge_exists(0, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete.
cpp/src/community/k_truss_impl.cuh
Outdated
shuffle_values(handle.get_comms(), edge_exists.begin(), rx_counts, handle.get_stream()); | ||
|
||
// The 'edge_exists' array is ordered based on 'cp_edgelist_srcs' where the edges where group, | ||
// hoever it needs to match 'edgelist_srcs', hence re-order 'edge_exists' accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? I guess edge_exists
is ordered based on edgelist_srcs
not cp_edgelist_srcs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes other way around. It is cp_edgelist_srcs
that is supposed to be groupby_and_count. The documentation is correct but made a mistake in the group_by_and_count call. I don't want to mess up original edges in the triangles when grouping that's why I make a copy.
void order_edge_based_on_dodg(raft::handle_t const& handle, | ||
graph_view_t<vertex_t, edge_t, false, multi_gpu>& graph_view, | ||
raft::device_span<vertex_t> edgelist_srcs, | ||
raft::device_span<vertex_t> edgelist_dsts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you doing in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this function is ordering the edges obtained from nbr_intersection (On the symmetric graph) based on the DODG edges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we identify weak edges, the next step is to retrieve the triangles incident to those weak edges and this is done with nbr_intersection (on the symmetric graph). Once we have the three endpoints of the triangle, the next step is to find the direction that matches the edges in the DODG and this is what that function does
cpp/src/community/k_truss_impl.cuh
Outdated
std::vector<size_t> h_tx_counts(d_tx_counts.size()); | ||
|
||
raft::update_host( | ||
h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to address this.
std::optional<rmm::device_uvector<vertex_t>> cp_edgelist_dsts{std::nullopt}; | ||
|
||
// FIXME: Minor comm is not working for all cases so I believe some edges a beyong | ||
// the partitioning range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a beyong
here sounds like a typo.
And have you confirmed that some edge sources/destinations are outside the expected range? Or maybe the code to use minor_comm has a bug? Whichever is the case, we need to really dig into this and make sure everything is working as expected. This optimization is for large scale K-Truss benchmarking. Any testing/debugging/tuning gets way more difficult in larger scales. We need to verify everything we can in small scales before going to a larger scale. If we have doubts in our code even in this scale, things will get way worse in larger scales.
cpp/src/community/k_truss_impl.cuh
Outdated
|
||
std::vector<size_t> h_tx_counts(d_tx_counts.size()); | ||
|
||
handle.sync_stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be after raft::update_host.
cpp/src/community/k_truss_impl.cuh
Outdated
h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream()); | ||
|
||
std::tie(srcs, rx_counts) = shuffle_values( | ||
handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have already created an alias for handle.get_comms()
in line 66. Why use handle.get_comms()
instead of just comm
?
cpp/src/community/k_truss_impl.cuh
Outdated
handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream()); | ||
|
||
std::tie(dsts, std::ignore) = shuffle_values( | ||
handle.get_comms(), cp_edgelist_dsts->begin(), h_tx_counts, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
raft::device_span<vertex_t> edgelist_srcs, | ||
raft::device_span<vertex_t> edgelist_dsts) | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, here, what we are actually doing is this.
For (src, dst) pairs in edgelist_srcs & edgelist_dsts, return (src, dst) if (src, dst) exists in graph_view, and return (dst, src) if not.
I think this code is overly complex.
If single-GPU, this code looks fine.
In multi-GPU, you first need to find unique (src, dst) pairs, shuffle, call has_edge, shuffle_back. Now you have (src, dst, exist) triplets. Sort them for binary search. Then, you can iterate over the input pairs and flip or not based on whether the edge exists or not.
In the current code, you are finding unique edges after shuffling. But better do this at the beginning to reduce computation/communication. And for temporary variables, try to minimize their scopes. Some variables' scopes are too large and this makes code harder to understand.
cpp/src/community/k_truss_impl.cuh
Outdated
auto src = thrust::get<0>(edgelist_first[idx]); | ||
auto dst = thrust::get<1>(edgelist_first[idx]); | ||
|
||
auto itr_pair = thrust::find( // FIXME: replace by lower bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a simple fix. Don't defer simple fixes for later updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reviewing the entire K-Truss code; more reviews coming.
@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle, | |||
|
|||
// 3. Keep only the edges from a low-degree vertex to a high-degree vertex. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto [srcs, dsts, wgts] = k_core(handle,
cur_graph_view,
edge_weight_view,
k - 1,
std::make_optional(k_core_degree_type_t::OUT),
std::make_optional(core_number_span));
I guess this code won't work if cur_graph_view != graph_view; edge_weight_view is invalid if cur_graph_view == modified_graph_view.
@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle, | |||
edge_weight{std::nullopt}; | |||
std::optional<rmm::device_uvector<weight_t>> wgts{std::nullopt}; | |||
|
|||
cugraph::edge_bucket_t<vertex_t, void, true, multi_gpu, true> edgelist_dodg(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better minimize the scope of edgelist_dodg
. This is used only for the transform_e
in line 460. No reason to define this variable way ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think maintaining both cur_graph_view
and modified_graph_view
is redundant.
Won't
auto cur_graph_view = modified_graph ? modified_graph.view() : graph_view
be enough?
} | ||
renumber_map = std::move(tmp_renumber_map); | ||
} | ||
edgelist_dodg.clear(); | ||
|
||
// 4. Compute triangle count using nbr_intersection and unroll weak edges | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this curly bracket? This closes at the end of this function definition and doesn't help in reducing the scope of any variable.
@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle, | |||
|
|||
// 3. Keep only the edges from a low-degree vertex to a high-degree vertex. | |||
|
|||
{ | |||
auto cur_graph_view = modified_graph_view ? *modified_graph_view : graph_view; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may place this DODG extraction part inside { ... }
. We just need to define dodg_mask
outside the bracket and I assume all the other variables aren't used besides extracting a DODG graph.
@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle, | |||
edge_weight{std::nullopt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maintain edge_weight
from here? We can treat the graph as an unweighted graph till we finish finding K-Truss edge sources & destinations. Once we have K-Truss edge sources & destinations, we can extract edge (src, dst, weight) triplets from the input graph.
Moving this to 25.02... hopefully merging very early. |
raft::device_span<vertex_t const>(intersection_indices.data(), | ||
intersection_indices.size()), | ||
raft::device_span<vertex_t const>(weak_edgelist_srcs.data(), weak_edgelist_srcs.size()), | ||
raft::device_span<vertex_t const>(weak_edgelist_dsts.data(), weak_edgelist_dsts.size())}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing all three edges in each triangle, what happens
- if we store only three triangle end points (a, b, c where out-degree(a) < out-degree(b) < out-degree(c) using the same tie breaking mechanism to create a DODG graph),
- Shuffle based on edge partitioning using (a, b)
- Remove duplicates
- Re-create an edge list (a, b) (a, c), (b, c) from each (a, b, c)
- Shuffle the result edge list
- Decrease edge counts.
Won't this work? This sounds simpler & cheaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we haven't implemented truss decomposition, right?
K-Truss should be implemented as truss decomposition with min_k = max_k = k
return true; | ||
}, | ||
dodg_mask.mutable_view(), | ||
false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really extract edgelist_dodg
and call transform_e
?
Can't we combine the extract_transform_e
and transform_e
calls?
Call transform_e
with edge_src_out_degrees.view()
& edge_dst_out_degrees.view()
?
This PR introduces several optimization to speed up K-Truss. In fact, our K-Truss implementation computes the intersection of all edges regardless they are weak or not which can be very expensive if only few edges need to be invalidated. By running
nbr_intersection
on the weak edges, this considerably improves the runtime.