Skip to content

Xtra-Computing/Clementi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing

Introduction

Clementi is a multi-FPGA based graph processing framework designed to achieve near-linear scalability. By overlapping communication with computation, Clementi optimizes end-to-end performance. Additionally, leveraging a custom hardware architecture in FPGA, we propose an architecture-oriented performance model and a workload scheduling method to minimize execution time discrepancies among FPGAs. Our experimental results demonstrate that Clementi significantly outperforms existing multi-FPGA frameworks, achieving speedups ranging from 1.86× to 8.75×, and exhibits near-linear scalability as the number of FPGAs increases.

Update Notice

Please note that the documentation is actively being updated, send to [email protected] if you have any question.

Prerequisites

Clementi development utilizes the Xilinx Vitis toolset. Key components include:

  • Xilinx XRT version 2.14.354
  • Vitis v++ at v2021.2 (64-bit)
  • OpenMPI at 4.1.4

The framework is executed on a public FPGA cluster: HACC cluster at NUS. For detailed information and access to this cluster, please refer to the HACC_NUS website. Each FPGA on this cluster is paired with a virtual CPU node, utilizing OpenMPI for distributed execution. The specific version used is Open MPI 4.1.4, and the system incorporates four Xilinx U250 FPGAs.

System Overview

Clementi Overview Clementi utilizes a three-phase approach to process large graphs on multi-FPGA platforms with a ring topology:

  1. Graph Partitioning: The input edge list is initially partitioned into subgraphs using a 2D partitioning method that integrates interval-shard and input-aware partition.
  2. Subgraph Assignment: A performance model predicts execution times for each subgraph, which informs a greedy-based scheduling algorithm to ensure balanced workloads across the FPGAs.
  3. Concurrent Processing: Each FPGA concurrently processes its assigned subgraphs, overlapping gather-scatter and global apply stages to optimize performance.

Initialization

In order to generate this design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license installation in Vivado.

To begin working with Clementi, clone this repository.

Build Hardware

# Load the Xilinx Vitis and XRT settings
source /opt/Xilinx/Vitis/2021.2/settings64.sh
source /opt/xilinx/xrt/setup.sh

# Build all components for the Clementi application
# dependencies:
sudo apt install libgraphviz-dev faketime
pip3 install graphviz

# make with the default configuration (as in app makefile)
make TARGET=hw all APP=clementi TYPE=pr # make with specified target
# or 
make TARGET=hw all APP=gather_scatter TYPE=pr
# or
make TARGET=hw all APP=global_apply
# or
make TARGET=hw all APP=single_gas TYPE=pr

Build Software

#If you only need to build software code, use the following command:
make host APP=clementi
# or 
make host APP=gather_scatter
# or
make host APP=global_apply

Test

Run the hardware single graph processor script to test the system. Ensure that you have the correct OpenMPI settings configured before running the multiple FPGAs demo:

## For test gather_scatter module:
./script/run_gather_scatter.sh
## For test global_apply module:
./script/run_global_apply.sh
## For test Clementi:
./script/run_clementi.sh R25 30 ## dataset = R25, and superstep = 30.

## For test single_gas, first login a U250 FPGA node, then:
./single_gas.app -d R25 -s 30 ## dataset = R25, and superstep = 30.

[Note] please pay attention to the graph dataset directory.

Repository Structure

  • app - Contains applications used within the project.
  • host - Host files for the XRT driver.
  • images - Images used in the README documentation.
  • mk - Directory containing makefiles.
  • partition - Includes the input-aware partition method and performance model.
  • src - Hardware files for each block in the Clementi framework.
  • test - Contains test files and scripts.
  • host_file - Host files used in MPI code.
  • script - Bash scripts in compilation.

Licenses

Ethernet/cmac License: BSD 3-Clause License

NetLayers/100G-fpga-network-stack-core License: BSD 3-Clause License

License Installation

Here is the simple solution for install cmac license using linux command:

## Step 1. Source the Vitis settings script:
source /path/to/Vitis/settings64.sh
## Step 2. Set the license file environment variable:
export XILINXD_LICENSE_FILE=/path/to/your/license.lic
## Step 3. To check the IP status in your project:
## a. Start Vivado in TCL mode:
vivado -mode tcl
## b. Open your project:
open_project /path/to/your/project.xpr
## c. Generate a report for all IP statuses and save it to a file:
report_ip_status -all > /path/to/ip_status_report.txt
## d. Close the project and exit the Vivado TCL shell:
close_project

📄 Related Publication

If you find this project helpful, please consider citing our related work:

Feng Yu, Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, and Weng-Fai Wong.
Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing.
Proceedings of the ACM on Management of Data (PACMMOD), Vol. 3, No. 3 (SIGMOD), Article 138, June 2025.
https://doi.org/10.1145/3725275

BibTeX

@article{yu2025clementi,
  title     = {Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing},
  author    = {Feng Yu and Hongshi Tan and Xinyu Chen and Yao Chen and Bingsheng He and Weng-fai Wong},
  journal   = {Proceedings of the ACM on Management of Data (PACMMOD)},
  volume    = {3},
  number    = {3 (SIGMOD)},
  article   = {138},
  year      = {2025},
  month     = {June},
  doi       = {10.1145/3725275},
  publisher = {ACM}
}

About

Clementi code repo (accepted by SIGMOD2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published