Optimize K-Truss #4742

jnke2016 · 2024-11-01T04:23:31Z

This PR introduces several optimization to speed up K-Truss. In fact, our K-Truss implementation computes the intersection of all edges regardless they are weak or not which can be very expensive if only few edges need to be invalidated. By running nbr_intersection on the weak edges, this considerably improves the runtime.

…2_optimize-ktruss

ChuckHastings · 2024-11-21T21:22:02Z

cpp/src/community/k_truss_impl.cuh

+    auto has_edge = thrust::binary_search(
+          thrust::seq, edgelist_first, weak_edgelist_first, edge_p_r);
+
+    if (!has_edge) { // FIXME: Do binary search instead


Is this FIXME still relevant? It looks like you're doing binary search

ChuckHastings · 2024-11-21T21:24:39Z

cpp/src/community/k_truss_impl.cuh

+
+
+template <typename vertex_t, typename edge_t>
+struct extract_edges { // FIXME:  ******************************Remove this functor. For testing purposes only*******************


Should these be left in here?

ChuckHastings · 2024-11-21T21:27:49Z

cpp/src/community/k_truss_impl.cuh

    std::optional<rmm::device_uvector<weight_t>> edgelist_wgts{std::nullopt};
-
+
+    //#if 0


Delete the commented out lines in this section

Right I still had a lot of debug prints which are now removed

…2_optimize-ktruss

seunghwak · 2024-11-22T17:49:20Z

@jnke2016 Is this PR ready for review?

seunghwak · 2024-11-22T18:42:10Z

cpp/src/community/k_truss_impl.cuh

+
+{
+  // FIXME: Use global comm for debugging purposes
+  // then replace it by minor comm once the accuracy is verified


Haven't you verified this? Using global_comm vs minor_comm has a pretty significant scalability impact in large scale (even if you're sending to/receiving from the same set of ranks). All-to-all on a subcommunicator performs better than P2P on a subset of ranks using a global communicator.

seunghwak · 2024-11-22T18:54:17Z

cpp/src/community/k_truss_impl.cuh

+  std::vector<size_t> h_tx_counts(d_tx_counts.size());
+
+  raft::update_host(
+    h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());


handle.sync_stream() is necessary before using h_tx_counts.

No guarantee that h_tx_counts will hold valid values before handle.sync_stream().

Don't forget to address this.

seunghwak · 2024-11-22T18:54:59Z

cpp/src/community/k_truss_impl.cuh

+  std::tie(dsts, std::ignore) =
+    shuffle_values(handle.get_comms(), edgelist_dsts.begin(), h_tx_counts, handle.get_stream());
+
+  // rmm::device_uvector<bool> edge_exists(0, handle.get_stream());


seunghwak · 2024-11-22T19:09:25Z

cpp/src/community/k_truss_impl.cuh

+    shuffle_values(handle.get_comms(), edge_exists.begin(), rx_counts, handle.get_stream());
+
+  // The 'edge_exists' array is ordered based on 'cp_edgelist_srcs' where the edges where group,
+  // hoever it needs to match 'edgelist_srcs', hence re-order 'edge_exists' accordingly.


Is this true? I guess edge_exists is ordered based on edgelist_srcs not cp_edgelist_srcs.

Yes other way around. It is cp_edgelist_srcs that is supposed to be groupby_and_count. The documentation is correct but made a mistake in the group_by_and_count call. I don't want to mess up original edges in the triangles when grouping that's why I make a copy.

seunghwak · 2024-11-22T19:11:27Z

cpp/src/community/k_truss_impl.cuh

+void order_edge_based_on_dodg(raft::handle_t const& handle,
+                              graph_view_t<vertex_t, edge_t, false, multi_gpu>& graph_view,
+                              raft::device_span<vertex_t> edgelist_srcs,
+                              raft::device_span<vertex_t> edgelist_dsts)


What are you doing in this function?

Essentially this function is ordering the edges obtained from nbr_intersection (On the symmetric graph) based on the DODG edges

Once we identify weak edges, the next step is to retrieve the triangles incident to those weak edges and this is done with nbr_intersection (on the symmetric graph). Once we have the three endpoints of the triangle, the next step is to find the direction that matches the edges in the DODG and this is what that function does

…2_optimize-ktruss

seunghwak · 2024-11-27T00:06:43Z

cpp/src/community/k_truss_impl.cuh

+  std::vector<size_t> h_tx_counts(d_tx_counts.size());
+
+  raft::update_host(
+    h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());


Don't forget to address this.

seunghwak · 2024-11-27T03:30:38Z

cpp/src/community/k_truss_impl.cuh

+  std::optional<rmm::device_uvector<vertex_t>> cp_edgelist_dsts{std::nullopt};
+
+  // FIXME: Minor comm is not working for all cases so I believe some edges a beyong
+  // the partitioning range


a beyong here sounds like a typo.

And have you confirmed that some edge sources/destinations are outside the expected range? Or maybe the code to use minor_comm has a bug? Whichever is the case, we need to really dig into this and make sure everything is working as expected. This optimization is for large scale K-Truss benchmarking. Any testing/debugging/tuning gets way more difficult in larger scales. We need to verify everything we can in small scales before going to a larger scale. If we have doubts in our code even in this scale, things will get way worse in larger scales.

seunghwak · 2024-11-27T03:35:50Z

cpp/src/community/k_truss_impl.cuh

+
+    std::vector<size_t> h_tx_counts(d_tx_counts.size());
+
+    handle.sync_stream();


This should be after raft::update_host.

seunghwak · 2024-11-27T03:37:05Z

cpp/src/community/k_truss_impl.cuh

+      h_tx_counts.data(), d_tx_counts.data(), d_tx_counts.size(), handle.get_stream());
+
+    std::tie(srcs, rx_counts) = shuffle_values(
+      handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream());


You have already created an alias for handle.get_comms() in line 66. Why use handle.get_comms() instead of just comm?

seunghwak · 2024-11-27T03:37:14Z

cpp/src/community/k_truss_impl.cuh

+      handle.get_comms(), cp_edgelist_srcs->begin(), h_tx_counts, handle.get_stream());
+
+    std::tie(dsts, std::ignore) = shuffle_values(
+      handle.get_comms(), cp_edgelist_dsts->begin(), h_tx_counts, handle.get_stream());


seunghwak · 2024-11-27T04:34:34Z

cpp/src/community/k_truss_impl.cuh

+                              raft::device_span<vertex_t> edgelist_srcs,
+                              raft::device_span<vertex_t> edgelist_dsts)
+
+{


So, here, what we are actually doing is this.

For (src, dst) pairs in edgelist_srcs & edgelist_dsts, return (src, dst) if (src, dst) exists in graph_view, and return (dst, src) if not.

I think this code is overly complex.

If single-GPU, this code looks fine.

In multi-GPU, you first need to find unique (src, dst) pairs, shuffle, call has_edge, shuffle_back. Now you have (src, dst, exist) triplets. Sort them for binary search. Then, you can iterate over the input pairs and flip or not based on whether the edge exists or not.

In the current code, you are finding unique edges after shuffling. But better do this at the beginning to reduce computation/communication. And for temporary variables, try to minimize their scopes. Some variables' scopes are too large and this makes code harder to understand.

seunghwak · 2024-11-27T04:35:38Z

cpp/src/community/k_truss_impl.cuh

+        auto src = thrust::get<0>(edgelist_first[idx]);
+        auto dst = thrust::get<1>(edgelist_first[idx]);
+
+        auto itr_pair = thrust::find(  // FIXME: replace by lower bound


This is a simple fix. Don't defer simple fixes for later updates.

…2_optimize-ktruss

seunghwak

I am reviewing the entire K-Truss code; more reviews coming.

seunghwak · 2024-12-04T19:41:31Z

cpp/src/community/k_truss_impl.cuh

@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle,

  // 3. Keep only the edges from a low-degree vertex to a high-degree vertex.



auto [srcs, dsts, wgts] = k_core(handle, cur_graph_view, edge_weight_view, k - 1, std::make_optional(k_core_degree_type_t::OUT), std::make_optional(core_number_span));

I guess this code won't work if cur_graph_view != graph_view; edge_weight_view is invalid if cur_graph_view == modified_graph_view.

seunghwak · 2024-12-04T19:47:34Z

cpp/src/community/k_truss_impl.cuh

@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle,
    edge_weight{std::nullopt};
  std::optional<rmm::device_uvector<weight_t>> wgts{std::nullopt};

+  cugraph::edge_bucket_t<vertex_t, void, true, multi_gpu, true> edgelist_dodg(handle);


Better minimize the scope of edgelist_dodg. This is used only for the transform_e in line 460. No reason to define this variable way ahead.

And I think maintaining both cur_graph_view and modified_graph_view is redundant.

Won't
auto cur_graph_view = modified_graph ? modified_graph.view() : graph_view be enough?

seunghwak · 2024-12-04T19:55:17Z

cpp/src/community/k_truss_impl.cuh

-    }
-    renumber_map = std::move(tmp_renumber_map);
-  }
+  edgelist_dodg.clear();

  // 4. Compute triangle count using nbr_intersection and unroll weak edges

  {


What's the purpose of this curly bracket? This closes at the end of this function definition and doesn't help in reducing the scope of any variable.

seunghwak · 2024-12-04T19:57:07Z

cpp/src/community/k_truss_impl.cuh

@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle,

  // 3. Keep only the edges from a low-degree vertex to a high-degree vertex.

-  {
-    auto cur_graph_view = modified_graph_view ? *modified_graph_view : graph_view;


We may place this DODG extraction part inside { ... }. We just need to define dodg_mask outside the bracket and I assume all the other variables aren't used besides extracting a DODG graph.

seunghwak · 2024-12-04T20:11:16Z

cpp/src/community/k_truss_impl.cuh

@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle,
    edge_weight{std::nullopt};


Should we maintain edge_weight from here? We can treat the graph as an unweighted graph till we finish finding K-Truss edge sources & destinations. Once we have K-Truss edge sources & destinations, we can extract edge (src, dst, weight) triplets from the input graph.

ChuckHastings · 2024-12-04T21:01:10Z

Moving this to 25.02... hopefully merging very early.

seunghwak · 2024-12-05T00:42:24Z

cpp/src/community/k_truss_impl.cuh

+          raft::device_span<vertex_t const>(intersection_indices.data(),
+                                            intersection_indices.size()),
+          raft::device_span<vertex_t const>(weak_edgelist_srcs.data(), weak_edgelist_srcs.size()),
+          raft::device_span<vertex_t const>(weak_edgelist_dsts.data(), weak_edgelist_dsts.size())});


Instead of storing all three edges in each triangle, what happens

if we store only three triangle end points (a, b, c where out-degree(a) < out-degree(b) < out-degree(c) using the same tie breaking mechanism to create a DODG graph),

Shuffle based on edge partitioning using (a, b)

Remove duplicates

Re-create an edge list (a, b) (a, c), (b, c) from each (a, b, c)

Shuffle the result edge list

Decrease edge counts.

Won't this work? This sounds simpler & cheaper.

And we haven't implemented truss decomposition, right?

K-Truss should be implemented as truss decomposition with min_k = max_k = k

seunghwak · 2024-12-05T00:45:24Z

cpp/src/community/k_truss_impl.cuh

+      return true;
+    },
+    dodg_mask.mutable_view(),
+    false);


Should we really extract edgelist_dodg and call transform_e?

Can't we combine the extract_transform_e and transform_e calls?

Call transform_e with edge_src_out_degrees.view() & edge_dst_out_degrees.view()?

jnke2016 added 3 commits October 31, 2024 19:30

optimize ktruss

e935857

benchmark k-truss

e61a580

add benchmark print

0c83b54

github-actions bot added the cuGraph label Nov 1, 2024

add weights support

2e0ef1a

jnke2016 self-assigned this Nov 15, 2024

jnke2016 added this to the 24.12 milestone Nov 15, 2024

jnke2016 marked this pull request as ready for review November 18, 2024 15:32

jnke2016 requested a review from a team as a code owner November 18, 2024 15:32

jnke2016 added 2 commits November 21, 2024 05:30

Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…

89597c5

…2_optimize-ktruss

update SG implementation of K-Truss and add MG

1259d41

ChuckHastings reviewed Nov 21, 2024

View reviewed changes

jnke2016 added 5 commits November 21, 2024 22:01

add function to reorder the edges based on the DODG

7f48a91

Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…

13a70cb

…2_optimize-ktruss

remove debug print statement

9ad6bfd

fix style

b515cd9

Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…

895cb0e

…2_optimize-ktruss

ChuckHastings approved these changes Nov 22, 2024

View reviewed changes

seunghwak reviewed Nov 22, 2024

View reviewed changes

jnke2016 added 3 commits November 26, 2024 16:36

add sync call and fix typos

3dfae34

fix style

8fa9cd9

Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…

7b8999d

…2_optimize-ktruss

seunghwak reviewed Nov 27, 2024

View reviewed changes

jnke2016 added 2 commits December 3, 2024 17:24

reduce before shuffling

106afb3

Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…

9cabbd2

…2_optimize-ktruss

seunghwak reviewed Dec 4, 2024

View reviewed changes

ChuckHastings modified the milestones: 24.12, 25.02 Dec 4, 2024

seunghwak reviewed Dec 5, 2024

View reviewed changes

Merge branch 'branch-24.12' into branch-24.12_optimize-ktruss

3fcc41e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize K-Truss #4742

Optimize K-Truss #4742

jnke2016 commented Nov 1, 2024 •

edited

Loading

ChuckHastings Nov 21, 2024

ChuckHastings Nov 21, 2024

ChuckHastings Nov 21, 2024

jnke2016 Nov 22, 2024

seunghwak commented Nov 22, 2024

seunghwak Nov 22, 2024

seunghwak Nov 22, 2024

seunghwak Nov 27, 2024

seunghwak Nov 22, 2024

seunghwak Nov 22, 2024

jnke2016 Nov 22, 2024

seunghwak Nov 22, 2024

jnke2016 Nov 22, 2024 •

edited

Loading

jnke2016 Nov 22, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak Nov 27, 2024

seunghwak left a comment

seunghwak Dec 4, 2024

seunghwak Dec 4, 2024

seunghwak Dec 4, 2024

seunghwak Dec 4, 2024

seunghwak Dec 4, 2024

seunghwak Dec 4, 2024

ChuckHastings commented Dec 4, 2024

seunghwak Dec 5, 2024

seunghwak Dec 5, 2024

seunghwak Dec 5, 2024



		template <typename vertex_t, typename edge_t>
		struct extract_edges { // FIXME: ****************************Remove this functor. For testing purposes only*****************

		std::optional<rmm::device_uvector<weight_t>> edgelist_wgts{std::nullopt};


		//#if 0


		std::vector<size_t> h_tx_counts(d_tx_counts.size());

		handle.sync_stream();

		@@ -214,121 +409,501 @@ k_truss(raft::handle_t const& handle,

		// 3. Keep only the edges from a low-degree vertex to a high-degree vertex.

		@@ -121,6 +314,8 @@ k_truss(raft::handle_t const& handle,
		edge_weight{std::nullopt};

Optimize K-Truss #4742

Are you sure you want to change the base?

Optimize K-Truss #4742

Conversation

jnke2016 commented Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak commented Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnke2016 Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuckHastings commented Dec 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnke2016 commented Nov 1, 2024 •

edited

Loading

jnke2016 Nov 22, 2024 •

edited

Loading