Commit 403cc54
authored
Multi-GPU Batched KMeans (#2017)
Closes #1989.
Adds multi-GPU support to KMeans fit for host-resident data, with two modes:
- **OpenMP (cuVS SNMG)**: A single process drives all local GPUs via OMP threads and raw NCCL. Activated automatically when the handle is a `device_resources_snmg`.
- **RAFT comms (Ray / Dask / MPI)**: Each rank is a separate process that calls fit with its own data shard and an initialized RAFT communicator. Coordination uses the RAFT comms.
Both modes share the same core Lloyd's loop, batched streaming of host data, NCCL/comms allreduce of centroid sums and counts, and synchronized convergence. Supports sample weights, n_init best-of-N restarts, KMeansPlusPlus initialization, and float/double. Falls back to single-GPU when neither multi-GPU resources nor comms are present.
Authors:
- Victor Lafargue (https://github.com/viclafargue)
- Tarang Jain (https://github.com/tarang-jain)
Approvers:
- Tarang Jain (https://github.com/tarang-jain)
- Micka (https://github.com/lowener)
- Dante Gama Dessavre (https://github.com/dantegd)
URL: #20171 parent 547a413 commit 403cc54
15 files changed
Lines changed: 1667 additions & 92 deletions
File tree
- cpp
- include/cuvs/cluster
- src
- cluster
- detail
- core
- tests
- cluster
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
128 | 133 | | |
129 | 134 | | |
130 | 135 | | |
| |||
134 | 139 | | |
135 | 140 | | |
136 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
137 | 145 | | |
138 | 146 | | |
139 | 147 | | |
| |||
177 | 185 | | |
178 | 186 | | |
179 | 187 | | |
180 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
181 | 202 | | |
182 | 203 | | |
183 | 204 | | |
| |||
208 | 229 | | |
209 | 230 | | |
210 | 231 | | |
211 | | - | |
| 232 | + | |
| 233 | + | |
212 | 234 | | |
213 | 235 | | |
214 | 236 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
969 | 969 | | |
970 | 970 | | |
971 | 971 | | |
972 | | - | |
| 972 | + | |
973 | 973 | | |
974 | 974 | | |
975 | 975 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
625 | | - | |
626 | | - | |
| 625 | + | |
| 626 | + | |
627 | 627 | | |
628 | | - | |
| 628 | + | |
629 | 629 | | |
630 | 630 | | |
631 | 631 | | |
632 | 632 | | |
633 | 633 | | |
634 | | - | |
| 634 | + | |
635 | 635 | | |
636 | 636 | | |
637 | 637 | | |
638 | | - | |
| 638 | + | |
639 | 639 | | |
640 | 640 | | |
641 | 641 | | |
642 | | - | |
| 642 | + | |
643 | 643 | | |
644 | | - | |
| 644 | + | |
645 | 645 | | |
646 | 646 | | |
647 | 647 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
463 | 463 | | |
464 | 464 | | |
465 | 465 | | |
466 | | - | |
| 466 | + | |
467 | 467 | | |
468 | 468 | | |
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
472 | | - | |
| 472 | + | |
473 | 473 | | |
474 | | - | |
475 | | - | |
476 | | - | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
477 | 477 | | |
478 | 478 | | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | 479 | | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
495 | 491 | | |
496 | 492 | | |
497 | 493 | | |
| |||
750 | 746 | | |
751 | 747 | | |
752 | 748 | | |
| 749 | + | |
753 | 750 | | |
754 | 751 | | |
755 | 752 | | |
0 commit comments