simple_ans

A Python package that provides lossless compression of integer datasets through Asymmetric Numeral Systems (ANS), implemented in C++ with pybind11 bindings.

I used the following to guide the implementation:

While there are certainly many ANS implementations that are parts of other packages, this one strives to be as simple as possible, with the C++ implementation being just a small amount of code in a single file. The Python interface is also simple and easy to use. At the same time it attempts to be as efficient as possible both in terms of compression ratio and encoding/decoding speed.

Important: This implementation is designed for data with approximately 2 to 5000 distinct values. Performance may degrade significantly with datasets containing more unique values.

Technical overview of ANS and Streaming ANS

Installation

simple_ans is available on PyPI:

pip install simple-ans

Developers may want to clone the repository and do an editable install:

git clone https://github.com/flatironinstitute/simple_ans.git
cd simple_ans
pip install -e .

For developers who want automatic rebuilds of the compiled extension:

pip install "scikit-build-core>=0.5.0" "pybind11>=2.11.1" "pip>=24" ninja
pip install -e . -Ceditable.rebuild=true --no-build-isolation

Usage

This package is designed for compressing quantized numerical data.

import numpy as np
from simple_ans import ans_encode, ans_decode

# Example: Compressing quantized Gaussian data
# Generate sample data following normal distribution
n_samples = 10000
# Generate Gaussian data, scale by 4, and quantize to integers
signal = np.round(np.random.normal(0, 1, n_samples) * 4).astype(np.int32)

# Encode (automatically determines optimal symbol counts)
encoded = ans_encode(signal)

# Decode
decoded = ans_decode(encoded)

# Verify
assert np.all(decoded == signal)

# Get compression stats
original_size = signal.nbytes
compressed_size = encoded.size()  # in bytes
compression_ratio = original_size / compressed_size
print(f"Compression ratio: {compression_ratio:.2f}x")

Tests

To run the tests, install with the test extra and run pytest:

pip install "simple-ans[test]"
pytest tests/

Simple benchmark

You can run a very simple benchmark that compares simple_ans with zlib, zstandard, lzma, and blosc2 at various compression levels for a toy dataset of quantized Gaussian noise. See devel/benchmark.py and devel/benchmark_ans_only.py.

The benchmark.py also runs in a CI environment and produces the following graph:

We see that for this example, the ANS-based compression ratio is higher than the other methods, almost reaching the theoretical ideal. The encode rate in MB/s is also faster than all but blosc. The decode rate is faster than Zlib and lzma but slower than Zstandard or blosc. I think in principle, we should be able to speed up the decoding. Let me know if you have ideas for this.

To install the benchmark dependencies, use:

pip install .[benchmark]

Extended benchmarks

A more comprehensive benchmark (devel/benchmark2.py) tests the compression performance across different types of distributions:

Bernoulli distributions with varying probabilities (p = 0.1 to 0.5)
Quantized Gaussian distributions with different quantization steps
Poisson distributions with various lambda parameters

The benchmark compares simple_ans against zstd-22, zlib-9, and blosc (using bitshuffle, zstd-1, and 2 MiB blocks), measuring compression ratios and processing speeds:

The results show that simple_ans achieves the overall highest compression ratios—close to the theoretical ideal across all distributions. The encode speed is faster than all but blosc. blosc typically achieves the highest encode and decode speeds and the second-highest compression ratios.

Authors

Jeremy Magland, Center for Computational Mathematics, Flatiron Institute

Robert Blackwell, Scientific Computing Core, Flatiron Institute

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
.vscode		.vscode
devel		devel
doc		doc
simple_ans		simple_ans
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
simple_ans_bind.cpp		simple_ans_bind.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple_ans

Installation

Usage

Tests

Simple benchmark

Extended benchmarks

Authors

About

Releases

Packages

Contributors 4

Languages

License

flatironinstitute/simple_ans

Folders and files

Latest commit

History

Repository files navigation

simple_ans

Installation

Usage

Tests

Simple benchmark

Extended benchmarks

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages