Clementi is a multi-FPGA based graph processing framework designed to achieve near-linear scalability. By overlapping communication with computation, Clementi optimizes end-to-end performance. Additionally, leveraging a custom hardware architecture in FPGA, we propose an architecture-oriented performance model and a workload scheduling method to minimize execution time discrepancies among FPGAs. Our experimental results demonstrate that Clementi significantly outperforms existing multi-FPGA frameworks, achieving speedups ranging from 1.86× to 8.75×, and exhibits near-linear scalability as the number of FPGAs increases.
Please note that the documentation is actively being updated, send to [email protected] if you have any question.
Clementi development utilizes the Xilinx Vitis toolset. Key components include:
- Xilinx XRT version 2.14.354
- Vitis v++ at v2021.2 (64-bit)
- OpenMPI at 4.1.4
The framework is executed on a public FPGA cluster: HACC cluster at NUS. For detailed information and access to this cluster, please refer to the HACC_NUS website. Each FPGA on this cluster is paired with a virtual CPU node, utilizing OpenMPI for distributed execution. The specific version used is Open MPI 4.1.4, and the system incorporates four Xilinx U250 FPGAs.
Clementi utilizes a three-phase approach to process large graphs on multi-FPGA platforms with a ring topology:
- Graph Partitioning: The input edge list is initially partitioned into subgraphs using a 2D partitioning method that integrates interval-shard and input-aware partition.
- Subgraph Assignment: A performance model predicts execution times for each subgraph, which informs a greedy-based scheduling algorithm to ensure balanced workloads across the FPGAs.
- Concurrent Processing: Each FPGA concurrently processes its assigned subgraphs, overlapping gather-scatter and global apply stages to optimize performance.
In order to generate this design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license installation in Vivado.
To begin working with Clementi, clone this repository.
# Load the Xilinx Vitis and XRT settings
source /opt/Xilinx/Vitis/2021.2/settings64.sh
source /opt/xilinx/xrt/setup.sh
# Build all components for the Clementi application
# dependencies:
sudo apt install libgraphviz-dev faketime
pip3 install graphviz
# make with the default configuration (as in app makefile)
make TARGET=hw all APP=clementi TYPE=pr # make with specified target
# or
make TARGET=hw all APP=gather_scatter TYPE=pr
# or
make TARGET=hw all APP=global_apply
# or
make TARGET=hw all APP=single_gas TYPE=pr
#If you only need to build software code, use the following command:
make host APP=clementi
# or
make host APP=gather_scatter
# or
make host APP=global_apply
Run the hardware single graph processor script to test the system. Ensure that you have the correct OpenMPI settings configured before running the multiple FPGAs demo:
## For test gather_scatter module:
./script/run_gather_scatter.sh
## For test global_apply module:
./script/run_global_apply.sh
## For test Clementi:
./script/run_clementi.sh R25 30 ## dataset = R25, and superstep = 30.
## For test single_gas, first login a U250 FPGA node, then:
./single_gas.app -d R25 -s 30 ## dataset = R25, and superstep = 30.
[Note] please pay attention to the graph dataset directory.
app
- Contains applications used within the project.host
- Host files for the XRT driver.images
- Images used in the README documentation.mk
- Directory containing makefiles.partition
- Includes the input-aware partition method and performance model.src
- Hardware files for each block in the Clementi framework.test
- Contains test files and scripts.host_file
- Host files used in MPI code.script
- Bash scripts in compilation.
Ethernet/cmac License: BSD 3-Clause License
NetLayers/100G-fpga-network-stack-core License: BSD 3-Clause License
Here is the simple solution for install cmac license using linux command:
## Step 1. Source the Vitis settings script:
source /path/to/Vitis/settings64.sh
## Step 2. Set the license file environment variable:
export XILINXD_LICENSE_FILE=/path/to/your/license.lic
## Step 3. To check the IP status in your project:
## a. Start Vivado in TCL mode:
vivado -mode tcl
## b. Open your project:
open_project /path/to/your/project.xpr
## c. Generate a report for all IP statuses and save it to a file:
report_ip_status -all > /path/to/ip_status_report.txt
## d. Close the project and exit the Vivado TCL shell:
close_project
If you find this project helpful, please consider citing our related work:
Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing.
Proceedings of the ACM on Management of Data (PACMMOD), Vol. 3, No. 3 (SIGMOD), Article 138, June 2025.
https://doi.org/10.1145/3725275
@article{yu2025clementi,
title = {Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing},
author = {Feng Yu and Hongshi Tan and Xinyu Chen and Yao Chen and Bingsheng He and Weng-fai Wong},
journal = {Proceedings of the ACM on Management of Data (PACMMOD)},
volume = {3},
number = {3 (SIGMOD)},
article = {138},
year = {2025},
month = {June},
doi = {10.1145/3725275},
publisher = {ACM}
}