GitHub - Xtra-Computing/Clementi: Clementi code repo (accepted by SIGMOD2025)

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Introduction

Clementi is a multi-FPGA based graph processing framework designed to achieve near-linear scalability. By overlapping communication with computation, Clementi optimizes end-to-end performance. Additionally, leveraging a custom hardware architecture in FPGA, we propose an architecture-oriented performance model and a workload scheduling method to minimize execution time discrepancies among FPGAs. Our experimental results demonstrate that Clementi significantly outperforms existing multi-FPGA frameworks, achieving speedups ranging from 1.86× to 8.75×, and exhibits near-linear scalability as the number of FPGAs increases.

Update Notice

Please note that the documentation is actively being updated, send to [email protected] if you have any question.

Prerequisites

Clementi development utilizes the Xilinx Vitis toolset. Key components include:

Xilinx XRT version 2.14.354
Vitis v++ at v2021.2 (64-bit)
OpenMPI at 4.1.4

The framework is executed on a public FPGA cluster: HACC cluster at NUS. For detailed information and access to this cluster, please refer to the HACC_NUS website. Each FPGA on this cluster is paired with a virtual CPU node, utilizing OpenMPI for distributed execution. The specific version used is Open MPI 4.1.4, and the system incorporates four Xilinx U250 FPGAs.

System Overview

Clementi utilizes a three-phase approach to process large graphs on multi-FPGA platforms with a ring topology:

Graph Partitioning: The input edge list is initially partitioned into subgraphs using a 2D partitioning method that integrates interval-shard and input-aware partition.
Subgraph Assignment: A performance model predicts execution times for each subgraph, which informs a greedy-based scheduling algorithm to ensure balanced workloads across the FPGAs.
Concurrent Processing: Each FPGA concurrently processes its assigned subgraphs, overlapping gather-scatter and global apply stages to optimize performance.

Initialization

In order to generate this design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license installation in Vivado.

To begin working with Clementi, clone this repository.

Build Hardware

# Load the Xilinx Vitis and XRT settings
source /opt/Xilinx/Vitis/2021.2/settings64.sh
source /opt/xilinx/xrt/setup.sh

# Build all components for the Clementi application
# dependencies:
sudo apt install libgraphviz-dev faketime
pip3 install graphviz

# make with the default configuration (as in app makefile)
make TARGET=hw all APP=clementi TYPE=pr # make with specified target
# or 
make TARGET=hw all APP=gather_scatter TYPE=pr
# or
make TARGET=hw all APP=global_apply
# or
make TARGET=hw all APP=single_gas TYPE=pr

Build Software

#If you only need to build software code, use the following command:
make host APP=clementi
# or 
make host APP=gather_scatter
# or
make host APP=global_apply

Test

Run the hardware single graph processor script to test the system. Ensure that you have the correct OpenMPI settings configured before running the multiple FPGAs demo:

## For test gather_scatter module:
./script/run_gather_scatter.sh
## For test global_apply module:
./script/run_global_apply.sh
## For test Clementi:
./script/run_clementi.sh R25 30 ## dataset = R25, and superstep = 30.

## For test single_gas, first login a U250 FPGA node, then:
./single_gas.app -d R25 -s 30 ## dataset = R25, and superstep = 30.

[Note] please pay attention to the graph dataset directory.

Repository Structure

app - Contains applications used within the project.
host - Host files for the XRT driver.
images - Images used in the README documentation.
mk - Directory containing makefiles.
partition - Includes the input-aware partition method and performance model.
src - Hardware files for each block in the Clementi framework.
test - Contains test files and scripts.
host_file - Host files used in MPI code.
script - Bash scripts in compilation.

Licenses

Ethernet/cmac License: BSD 3-Clause License

NetLayers/100G-fpga-network-stack-core License: BSD 3-Clause License

License Installation

Here is the simple solution for install cmac license using linux command:

## Step 1. Source the Vitis settings script:
source /path/to/Vitis/settings64.sh
## Step 2. Set the license file environment variable:
export XILINXD_LICENSE_FILE=/path/to/your/license.lic
## Step 3. To check the IP status in your project:
## a. Start Vivado in TCL mode:
vivado -mode tcl
## b. Open your project:
open_project /path/to/your/project.xpr
## c. Generate a report for all IP statuses and save it to a file:
report_ip_status -all > /path/to/ip_status_report.txt
## d. Close the project and exit the Vivado TCL shell:
close_project

📄 Related Publication

If you find this project helpful, please consider citing our related work:

Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing.
Proceedings of the ACM on Management of Data (PACMMOD), Vol. 3, No. 3 (SIGMOD), Article 138, June 2025.
https://doi.org/10.1145/3725275

BibTeX

@article{yu2025clementi,
  title     = {Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing},
  author    = {Feng Yu and Hongshi Tan and Xinyu Chen and Yao Chen and Bingsheng He and Weng-fai Wong},
  journal   = {Proceedings of the ACM on Management of Data (PACMMOD)},
  volume    = {3},
  number    = {3 (SIGMOD)},
  article   = {138},
  year      = {2025},
  month     = {June},
  doi       = {10.1145/3725275},
  publisher = {ACM}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Introduction

Update Notice

Prerequisites

System Overview

Initialization

Build Hardware

Build Software

Test

Repository Structure

Licenses

License Installation

📄 Related Publication

BibTeX

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
common		common
host		host
images		images
mk		mk
partition_schedule		partition_schedule
script		script
src		src
test		test
Makefile		Makefile
README.md		README.md
THIRD_PARTY_LIC.md		THIRD_PARTY_LIC.md
host_file		host_file

Xtra-Computing/Clementi

Folders and files

Latest commit

History

Repository files navigation

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Introduction

Update Notice

Prerequisites

System Overview

Initialization

Build Hardware

Build Software

Test

Repository Structure

Licenses

License Installation

📄 Related Publication

BibTeX

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages