SparseCompress

Efficient sparse tensor compression and inference library with bit-packed storage and sliding window computation.

Overview

SparseCompress implements a novel approach to storing and computing with sparse neural networks:

Compact storage: Non-zero weights stored as a list + bit-packed position mask
Memory efficiency: Up to 24x compression for 99% sparse tensors
Sliding window inference: Minimize memory usage during forward pass
Zero reconstruction error: Perfect numerical accuracy

Installation

pip install -r requirements.txt

Quick Start

from sparse_compress import SparseCompress
import numpy as np

# Initialize compressor
compressor = SparseCompress(sparsity_threshold=1e-6)

# Create a sparse weight matrix
weight = np.random.randn(1024, 512).astype(np.float32)
weight[np.random.random((1024, 512)) < 0.9] = 0  # 90% sparsity

# Compress the weight matrix
compressed = compressor.compress(weight)
print(f"Compression ratio: {compressor.compression_ratio(weight, compressed):.2f}x")

# Perform efficient matrix multiplication with sliding window
input_batch = np.random.randn(32, 512).astype(np.float32)
output = compressor.sliding_window_matmul(compressed, input_batch, window_size=128)

Key Features

Sparse Tensor Storage

Non-zero values stored in compact array
Positions encoded as bit-packed mask (1 bit per element)
Significant memory savings for sparse tensors (>80% sparsity)

Sliding Window Inference

Process large matrices in configurable chunks
Reduces peak memory usage during computation
Maintains exact numerical precision

Performance

Compression: 4-24x depending on sparsity level
Memory: Up to 16x reduction with sliding window
Accuracy: Zero reconstruction error

Examples

Run the examples to see SparseCompress in action:

# Run comprehensive examples and benchmarks
python example_usage.py

# Run tests
python test_sparse_compress.py

Use Cases

Training and inference with sparse neural networks
Memory-constrained edge deployment
Large-scale sparse matrix operations
Efficient storage of pruned models

API Reference

SparseCompress

Main class for compression and computation operations.

Methods

compress(tensor): Convert dense tensor to sparse format
decompress(sparse_tensor): Reconstruct full dense tensor
sliding_window_matmul(sparse_weight, input_tensor, window_size): Memory-efficient matrix multiplication
get_sparsity_ratio(tensor): Calculate fraction of zero elements
compression_ratio(original, compressed): Calculate space savings

SparseTensor

Data structure for compressed representation.

Attributes

values: Array of non-zero values
mask_bytes: Bit-packed position mask
shape: Original tensor dimensions
memory_size: Total memory usage in bytes

CUDA Acceleration

For GPU-accelerated sparse operations, use the CUDA implementation:

from sparse_compress_cuda import SparseCompressCUDA

# Initialize CUDA compressor
compressor = SparseCompressCUDA(sparsity_threshold=1e-6)

# Compress on GPU
weight = np.random.randn(1024, 512).astype(np.float32)
weight[np.random.random((1024, 512)) < 0.95] = 0

compressed = compressor.compress(weight)

# Prepare for matrix multiplication (computes row offsets)
compressed = compressor.prepare_for_matmul(compressed)

# Fast sparse matmul on GPU
input_batch = np.random.randn(128, 512).astype(np.float32)
output = compressor.sparse_matmul(compressed, input_batch)

# Or use sliding window for memory efficiency
output = compressor.sliding_window_matmul(compressed, input_batch, window_size=128)

CUDA Requirements

CUDA Toolkit 11.0+
Numba with CUDA support: pip install numba

CUDA Kernels

The raw CUDA kernels in cuda/sparse_kernels.cu can be compiled for maximum performance:

cd cuda
chmod +x build.sh
CUDA_ARCH=sm_80 ./build.sh  # Adjust for your GPU architecture

Key optimizations in the CUDA implementation:

Parallel prefix sums using warp-level shuffle intrinsics
Coalesced memory access patterns
Shared memory tiling for input reuse in matmul
Atomic operations for concurrent mask bit setting

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cuda		cuda
.gitignore		.gitignore
README.md		README.md
example_usage.py		example_usage.py
requirements.txt		requirements.txt
sparse_compress.py		sparse_compress.py
sparse_compress_cuda.py		sparse_compress_cuda.py
test_cuda.py		test_cuda.py
test_sparse_compress.py		test_sparse_compress.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparseCompress

Overview

Installation

Quick Start

Key Features

Sparse Tensor Storage

Sliding Window Inference

Performance

Examples

Use Cases

API Reference

SparseCompress

Methods

SparseTensor

Attributes

CUDA Acceleration

CUDA Requirements

CUDA Kernels

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SparseCompress

Overview

Installation

Quick Start

Key Features

Sparse Tensor Storage

Sliding Window Inference

Performance

Examples

Use Cases

API Reference

SparseCompress

Methods

SparseTensor

Attributes

CUDA Acceleration

CUDA Requirements

CUDA Kernels

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages