Systolic CNN AcceLErator Simulator (SCALE Sim) v3

SCALE Sim is a simulator for systolic array based accelerators for Convolution, Feed Forward, and any layer that uses GEMMs. This is a refreshed version of the simulator with feature enhancements, restructured code to aid feature additions, and ease of distribution.

The previous version of the simulator can be found here.

Features

SCALE-Sim v3 includes several advanced features:

Sparsity Support: Layer-wise and row-wise sparsity support for efficient neural network execution
Ramulator Integration: Detailed memory model integration for evaluating DRAM performance
Accelergy Integration: Energy and power estimation capabilities
Layout Support: Advanced memory layout configurations
Multi-core Support: Support for multi-core simulations

Getting started in 30 seconds

Installing the package

Getting started is simple! SCALE-Sim is completely written in python and could be installed from source.

You can install SCALE-Sim in your environment using the following command

$ pip3 install <path-to-scalesim-v3-folder>

If you are a developer that will modify scale-sim during your usage, please install it with -e flag, which will create a symbolic link instead of replicating scalesim in your environment, thus modification of scale-sim code can be syncronized simultaneously

$ pip3 install -e <path-to-scalesim-v3-folder>

Launching a run

After installing SCALE-Sim, it can be run by using the scalesim.scale and providing the paths to the architecture configuration, and the topology descriptor csv file.

$ python3 -m scalesim.scale -c <path_to_config_file> -t <path_to_topology_file> -p <path_to_output_log_dir>

Running from source

The above method uses the installed package for running the simulator. In cases where you would like to run directly from the source, the following command should be used instead.

$ PYTHONPATH=$PYTHONPATH:<scale_sim_repo_root> python3 <scale_sim_repo_root>/scalesim/scale.py -c <path_to_config_file> -t <path_to_topology_file>

If you are running from sources for the first time and do not have all the dependencies installed, please install them first using the following command.

$ pip3 install -r <scale_sim_repo_root>/requirements.txt

Tool inputs

SCALE-Sim uses two input files to run, a configuration file and a topology file.

Configuration file

The configuration file is used to specify the architecture and run parameters for the simulations. The following shows a sample config file:

The config file has three sections. The "general" section specifies the run name, which is user specific. The "architecture_presets" section describes the parameter of the systolic array hardware to simulate. The "run_preset" section specifies if the simulator should run with user specified bandwidth, or should it calculate the optimal bandwidth for stall free execution.

The detailed documentation for the config file could be found here (TBD)

Topology file

The topology file is a CSV file which decribes the layers of the workload topology. The layers are typically described as convolution layer parameters as shown in the example below.

For other layer types, SCALE-Sim also accepts the workload desciption in M, N, K format of the equivalent GEMM operation as shown in the example below.

The tool however expects the inputs to be in the convolution format by default. When using the mnk format for input, please specify using the -i gemm switch, as shown in the example below.

$ python3 <scale sim repo root>/scalesim/scale.py -c <path_to_config_file> -t <path_to_mnk_topology_file> -i gemm

Output

Here is an example output dumped to stdout when running Yolo Tiny (whose configuration is in yolo_tiny.csv):

Also, the simulator generates read write traces and summary logs at <run_dir>/../scalesim_outputs/. The user can also provide a custom location using -p <custom_output_directory> when using scalesim.py file. There are three summary logs:

COMPUTE_REPORT.csv: Layer wise logs for compute cycles, stalls, utilization percentages etc.
BANDWIDTH_REPORT.csv: Layer wise information about average and maximum bandwidths for each operand when accessing SRAM and DRAM
DETAILED_ACCESS_REPORT.csv: Layer wise information about number of accesses and access cycles for each operand for SRAM and DRAM.

In addition cycle accurate SRAM/DRAM access logs are also dumped and could be accesses at <outputs_dir>/<run_name>/ eg <run_dir>/../scalesim_outputs/<run_name>

Advanced Features

Using Multi-core feature

SCALE-Sim v3 introduces multi-core simulation capabilities to address the limitations of its predecessor, SCALE-Sim v2, which could only model single-core systolic arrays. This feature allows comprehensive modeling of modern AI accelerators equipped with multiple tensor cores, enabling researchers to simulate advanced workloads and optimize performance. For detailed setup and usage instructions, refer to the multi-core/README.md file.

Using Sparsity feature

SCALE-Sim v3 introduces advanced support for layer-wise and row-wise sparsity. For detailed information about sparsity features and usage, refer to the README_Sparsity.md file.

Key features include:

Layer-wise sparsity with customizable configurations
Row-wise sparsity with N:M ratio support
Support for different sparse representations (CSR, CSC, Blocked ELLPACK)
Detailed sparsity reports and metrics

Using Ramulator feature

SCALE-sim v3 integrates a detailed memory model with the systolic array computation. Users can evaluate:

Stall cycles due to data load from memory
Bank conflicts
Different memory types (DDR3, DDR4, etc.)
Various memory configurations (channels, rows, etc.)

For detailed setup and usage instructions, refer to the README_ramulator.md file.

Using Accelergy feature

SCALE-Sim v3 integrates with Accelergy for energy and power estimation. This feature allows:

Energy estimation of systolic array architectures
Power analysis
Integration with CACTI and Aladdin plugins for accurate estimation

For setup and usage instructions, refer to the README_accelergy.md file.

Using Layout feature

SCALE-Sim v3 supports advanced memory layout configurations for on-chip buffers. The layout feature enables:

Custom Data Organization: Specify different data layouts for ifmap, filter, and ofmap tensors
Bank Conflict Evaluation: Model realistic memory access patterns and bank conflicts
Multi-bank Support: Configure number of memory banks and ports per bank
Layout Specification: Define layouts through three key parameters:
- intraline_factor: Specifies elements per line for each dimension
- intraline_order: Controls dimension ordering within a line
- interline_order: Controls dimension ordering across lines

Layout configurations can be specified in the architecture configuration file using parameters like:

OnChipMemoryBanks: Total number of on-chip memory banks
OnChipMemoryBankPorts: Number of ports per bank
IfmapCustomLayout/FilterCustomLayout: Enable custom layouts for tensors

For detailed information about layout features and usage, refer to the documentation in the README_layout.md file.

Detailed Documentation

Detailed documentation about the tool can be found here (TBD). You can refer to the SCALE-Sim v3 paper (to be presented at ISPASS'25):

Raj, R., Banerjee, S., Chandra, N., Wan, Z., Tong, J., Samajdhar, A., & Krishna, T.; "SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis." arXiv preprint arXiv:2504.15377 (2025) [pdf]

We also recommend referring to the following papers for insights on SCALE-Sim's potential.

[1] Samajdar, A., Zhu, Y., Whatmough, P., Mattina, M., & Krishna, T.; "Scale-sim: Systolic cnn accelerator simulator." arXiv preprint arXiv:1811.02883 (2018). [pdf]

[2] Samajdar, A., Joseph, J. M., Zhu, Y., Whatmough, P., Mattina, M., & Krishna, T.; "A systematic methodology for characterizing scalability of DNN accelerators using SCALE-sim". In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). [pdf]

Citing this work

If you found this tool useful, please use the following bibtex to cite us

@inproceedings{raj2025scale,
  title={SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis},
  author={Raj, Ritik and Banerjee, Sarbartha and Chandra, Nikhil and Wan, Zishen and Tong, Jianming and Samajdhar, Ananda and Krishna, Tushar},
  booktitle={2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)},
  pages={186--200},
  year={2025},
  organization={IEEE}
}

Contributing to the project

We are happy for your contributions and would love to merge new features into our stable codebase. To ensure continuity within the project, please consider the following workflow.

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Pull Request Process

Ensure any install or build dependencies are removed before the end of the layer when doing a build. Please do not commit temporary files to the repo.
Update the documentation in the documentation/-folder with details of changes to the interface, this includes new environment variables, exposed ports, useful file locations and container parameters.
Add a tutorial how to use your new feature in form of a jupyter notebook to the documentation, as well. This makes sure that others can use your code!
Add test cases to our unit test system for your contribution.
Increase the version numbers in any example's files and the README.md to the new version that this Pull Request would represent. The versioning scheme we use is SemVer. Add your changes to the CHANGELOG.md. Address the issue numbers that you are solving.
You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.

Developers

Dev and maintainers:

Ritik Raj - Lead developer (@ritikraj7)
Sarbartha Banerjee - Ramulator feature (@iamsarbartha)
Nikhil Chandra - Sparsity feature (@NikhilChandraNcbs)
Zishen Wan - Accelergy feature (@zishenwan)
Jianming Tong - SRAM Layout feature (@JianmingTONG)

Advisors

Ananda Samajdar
Tushar Krishna

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github/workflows		.github/workflows
code-examples/systolic-array-rtl		code-examples/systolic-array-rtl
configs		configs
dist		dist
documentation/resources		documentation/resources
layouts		layouts
multi-core @ 2cf6c8e		multi-core @ 2cf6c8e
scalesim		scalesim
scripts		scripts
submodules		submodules
test		test
topologies		topologies
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_Sparsity.md		README_Sparsity.md
README_accelergy.md		README_accelergy.md
README_layout.md		README_layout.md
README_ramulator.md		README_ramulator.md
generate_fig10_ramulator_stall_plot.sh		generate_fig10_ramulator_stall_plot.sh
generate_fig9_ramulator_mem_bw_plot.sh		generate_fig9_ramulator_mem_bw_plot.sh
log		log
requirements.txt		requirements.txt
run_ramulator.sh		run_ramulator.sh
run_ramulator_mnk.sh		run_ramulator_mnk.sh
setup.py		setup.py
test1_run_benchmark_bankconflict.sh		test1_run_benchmark_bankconflict.sh
test2_run_benchmark_multibank.sh		test2_run_benchmark_multibank.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Systolic CNN AcceLErator Simulator (SCALE Sim) v3

Features

Getting started in 30 seconds

Installing the package

Launching a run

Running from source

Tool inputs

Configuration file

Topology file

Output

Advanced Features

Using Multi-core feature

Using Sparsity feature

Using Ramulator feature

Using Accelergy feature

Using Layout feature

Detailed Documentation

Citing this work

Contributing to the project

Pull Request Process

Developers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

scalesim-project/scale-sim-v3

Folders and files

Latest commit

History

Repository files navigation

Systolic CNN AcceLErator Simulator (SCALE Sim) v3

Features

Getting started in 30 seconds

Installing the package

Launching a run

Running from source

Tool inputs

Configuration file

Topology file

Output

Advanced Features

Using Multi-core feature

Using Sparsity feature

Using Ramulator feature

Using Accelergy feature

Using Layout feature

Detailed Documentation

Citing this work

Contributing to the project

Pull Request Process

Developers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages