Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize K-Truss #4742

Open
wants to merge 17 commits into
base: branch-24.12
Choose a base branch
from

Conversation

jnke2016
Copy link
Contributor

@jnke2016 jnke2016 commented Nov 1, 2024

This PR introduces several optimization to speed up K-Truss. In fact, our K-Truss implementation computes the intersection of all edges regardless they are weak or not which can be very expensive if only few edges need to be invalidated. By running nbr_intersection on the weak edges, this considerably improves the runtime.

@jnke2016 jnke2016 self-assigned this Nov 15, 2024
@jnke2016 jnke2016 added this to the 24.12 milestone Nov 15, 2024
@jnke2016 jnke2016 marked this pull request as ready for review November 18, 2024 15:32
@jnke2016 jnke2016 requested a review from a team as a code owner November 18, 2024 15:32
auto has_edge = thrust::binary_search(
thrust::seq, edgelist_first, weak_edgelist_first, edge_p_r);

if (!has_edge) { // FIXME: Do binary search instead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this FIXME still relevant? It looks like you're doing binary search



template <typename vertex_t, typename edge_t>
struct extract_edges { // FIXME: ******************************Remove this functor. For testing purposes only*******************
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be left in here?

std::optional<rmm::device_uvector<weight_t>> edgelist_wgts{std::nullopt};


//#if 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the commented out lines in this section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I still had a lot of debug prints which are now removed

@seunghwak
Copy link
Contributor

@jnke2016 Is this PR ready for review?


{
// FIXME: Use global comm for debugging purposes
// then replace it by minor comm once the accuracy is verified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't you verified this? Using global_comm vs minor_comm has a pretty significant scalability impact in large scale (even if you're sending to/receiving from the same set of ranks). All-to-all on a subcommunicator performs better than P2P on a subset of ranks using a global communicator.

std::vector<size_t> h_tx_counts(d_tx_counts.size());

raft::update_host(
h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle.sync_stream() is necessary before using h_tx_counts.

No guarantee that h_tx_counts will hold valid values before handle.sync_stream().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to address this.

std::tie(dsts, std::ignore) =
shuffle_values(handle.get_comms(), edgelist_dsts.begin(), h_tx_counts, handle.get_stream());

// rmm::device_uvector<bool> edge_exists(0, handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete.

shuffle_values(handle.get_comms(), edge_exists.begin(), rx_counts, handle.get_stream());

// The 'edge_exists' array is ordered based on 'cp_edgelist_srcs' where the edges where group,
// hoever it needs to match 'edgelist_srcs', hence re-order 'edge_exists' accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? I guess edge_exists is ordered based on edgelist_srcs not cp_edgelist_srcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes other way around. It is cp_edgelist_srcs that is supposed to be groupby_and_count. The documentation is correct but made a mistake in the group_by_and_count call. I don't want to mess up original edges in the triangles when grouping that's why I make a copy.

void order_edge_based_on_dodg(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu>& graph_view,
raft::device_span<vertex_t> edgelist_srcs,
raft::device_span<vertex_t> edgelist_dsts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you doing in this function?

Copy link
Contributor Author

@jnke2016 jnke2016 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially this function is ordering the edges obtained from nbr_intersection (On the symmetric graph) based on the DODG edges

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we identify weak edges, the next step is to retrieve the triangles incident to those weak edges and this is done with nbr_intersection (on the symmetric graph). Once we have the three endpoints of the triangle, the next step is to find the direction that matches the edges in the DODG and this is what that function does

std::vector<size_t> h_tx_counts(d_tx_counts.size());

raft::update_host(
h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to address this.

std::optional<rmm::device_uvector<vertex_t>> cp_edgelist_dsts{std::nullopt};

// FIXME: Minor comm is not working for all cases so I believe some edges a beyong
// the partitioning range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a beyong here sounds like a typo.

And have you confirmed that some edge sources/destinations are outside the expected range? Or maybe the code to use minor_comm has a bug? Whichever is the case, we need to really dig into this and make sure everything is working as expected. This optimization is for large scale K-Truss benchmarking. Any testing/debugging/tuning gets way more difficult in larger scales. We need to verify everything we can in small scales before going to a larger scale. If we have doubts in our code even in this scale, things will get way worse in larger scales.


std::vector<size_t> h_tx_counts(d_tx_counts.size());

handle.sync_stream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be after raft::update_host.

h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());

std::tie(srcs, rx_counts) = shuffle_values(
handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have already created an alias for handle.get_comms() in line 66. Why use handle.get_comms() instead of just comm?

handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream());

std::tie(dsts, std::ignore) = shuffle_values(
handle.get_comms(), cp_edgelist_dsts->begin(), h_tx_counts, handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

raft::device_span<vertex_t> edgelist_srcs,
raft::device_span<vertex_t> edgelist_dsts)

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here, what we are actually doing is this.

For (src, dst) pairs in edgelist_srcs & edgelist_dsts, return (src, dst) if (src, dst) exists in graph_view, and return (dst, src) if not.

I think this code is overly complex.

If single-GPU, this code looks fine.

In multi-GPU, you first need to find unique (src, dst) pairs, shuffle, call has_edge, shuffle_back. Now you have (src, dst, exist) triplets. Sort them for binary search. Then, you can iterate over the input pairs and flip or not based on whether the edge exists or not.

In the current code, you are finding unique edges after shuffling. But better do this at the beginning to reduce computation/communication. And for temporary variables, try to minimize their scopes. Some variables' scopes are too large and this makes code harder to understand.

auto src = thrust::get<0>(edgelist_first[idx]);
auto dst = thrust::get<1>(edgelist_first[idx]);

auto itr_pair = thrust::find( // FIXME: replace by lower bound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a simple fix. Don't defer simple fixes for later updates.

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reviewing the entire K-Truss code; more reviews coming.

@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle,

// 3. Keep only the edges from a low-degree vertex to a high-degree vertex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    auto [srcs, dsts, wgts] = k_core(handle,
                                     cur_graph_view,
                                     edge_weight_view,
                                     k - 1,
                                     std::make_optional(k_core_degree_type_t::OUT),
                                     std::make_optional(core_number_span));

I guess this code won't work if cur_graph_view != graph_view; edge_weight_view is invalid if cur_graph_view == modified_graph_view.

@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle,
edge_weight{std::nullopt};
std::optional<rmm::device_uvector<weight_t>> wgts{std::nullopt};

cugraph::edge_bucket_t<vertex_t, void, true, multi_gpu, true> edgelist_dodg(handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better minimize the scope of edgelist_dodg. This is used only for the transform_e in line 460. No reason to define this variable way ahead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think maintaining both cur_graph_view and modified_graph_view is redundant.

Won't
auto cur_graph_view = modified_graph ? modified_graph.view() : graph_view be enough?

}
renumber_map = std::move(tmp_renumber_map);
}
edgelist_dodg.clear();

// 4. Compute triangle count using nbr_intersection and unroll weak edges

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this curly bracket? This closes at the end of this function definition and doesn't help in reducing the scope of any variable.

@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle,

// 3. Keep only the edges from a low-degree vertex to a high-degree vertex.

{
auto cur_graph_view = modified_graph_view ? *modified_graph_view : graph_view;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may place this DODG extraction part inside { ... }. We just need to define dodg_mask outside the bracket and I assume all the other variables aren't used besides extracting a DODG graph.

@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle,
edge_weight{std::nullopt};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain edge_weight from here? We can treat the graph as an unweighted graph till we finish finding K-Truss edge sources & destinations. Once we have K-Truss edge sources & destinations, we can extract edge (src, dst, weight) triplets from the input graph.

@ChuckHastings
Copy link
Collaborator

Moving this to 25.02... hopefully merging very early.

@ChuckHastings ChuckHastings modified the milestones: 24.12, 25.02 Dec 4, 2024
raft::device_span<vertex_t const>(intersection_indices.data(),
intersection_indices.size()),
raft::device_span<vertex_t const>(weak_edgelist_srcs.data(), weak_edgelist_srcs.size()),
raft::device_span<vertex_t const>(weak_edgelist_dsts.data(), weak_edgelist_dsts.size())});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing all three edges in each triangle, what happens

  1. if we store only three triangle end points (a, b, c where out-degree(a) < out-degree(b) < out-degree(c) using the same tie breaking mechanism to create a DODG graph),
  2. Shuffle based on edge partitioning using (a, b)
  3. Remove duplicates
  4. Re-create an edge list (a, b) (a, c), (b, c) from each (a, b, c)
  5. Shuffle the result edge list
  6. Decrease edge counts.

Won't this work? This sounds simpler & cheaper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we haven't implemented truss decomposition, right?

K-Truss should be implemented as truss decomposition with min_k = max_k = k

return true;
},
dodg_mask.mutable_view(),
false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really extract edgelist_dodg and call transform_e?

Can't we combine the extract_transform_e and transform_e calls?

Call transform_e with edge_src_out_degrees.view() & edge_dst_out_degrees.view()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants