Skip to content

sweepai/difflib-rs

Repository files navigation

difflib-rs

A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.

Overview

This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.

Features

  • 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
  • 100% Compatible: Drop-in replacement for difflib.unified_diff with identical output
  • Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
  • Easy to use: Simple Python API with PyO3 bindings

Installation

From pip

pip install difflib-rs

Build from source

# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs

# Set up virtual environment
python -m venv venv
source venv/bin/activate

# Install build dependencies
pip install maturin pytest

# Build and install
maturin develop --release

Usage

This is a drop-in replacement for Python's difflib.unified_diff. Simply replace your import:

- from difflib import unified_diff
+ from difflib_rs import unified_diff

# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']

diff = unified_diff(
    a, b,
    fromfile='original.txt',
    tofile='modified.txt',
    fromfiledate='2023-01-01',
    tofiledate='2023-01-02'
)

for line in diff:
    print(line, end='')

Note: Currently, only unified_diff is supported. Other difflib functions are not implemented, but pull requests are welcome!

Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py for reference.

Extra: String-based API

For additional convenience, use unified_diff_str directly with (unsplit) strings:

from difflib_rs import unified_diff_str

# Compare two strings directly - no need to split first!
text_a = """line1
line2
line3"""

text_b = """line1
modified
line3"""

# The function handles splitting internally (more efficient)
diff = unified_diff_str(
    text_a, text_b,
    fromfile='original.txt',
    tofile='modified.txt',
    keepends=False  # Whether to keep line endings in the diff
)

for line in diff:
    print(line, end='')

The unified_diff_str function:

  • Takes strings directly instead of lists
  • Handles line splitting internally in Rust (faster than Python's splitlines())
  • Supports \n, \r\n, and \r line endings
  • Has a keepends parameter to preserve line endings in the output

Performance

The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:

Benchmark Results (Baseline - HashMap Implementation)

Small to Medium Files (10% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 86.0μs 38.3μs 2.24x 71
500 lines 450.6μs 130.3μs 3.46x 300
1,000 lines 910.2μs 220.8μs 4.12x 587
2,000 lines 2203.1μs 482.3μs 4.57x 1,222

Files with Heavy Changes (50% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 167.9μs 49.3μs 3.41x 131
500 lines 1028.5μs 252.0μs 4.08x 655
1,000 lines 1925.0μs 414.3μs 4.65x 1,285

Large Files with Few Changes

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 5 2842.0μs 859.7μs 3.31x 47
10,000 lines 5 5003.2μs 1471.3μs 3.40x 47
20,000 lines 5 8470.5μs 2821.6μs 3.00x 47

Large Files with Medium Changes (5% changed)

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 250 7985.5μs 1579.4μs 5.06x 1,869
10,000 lines 500 14692.5μs 2833.8μs 5.18x 3,793
20,000 lines 1,000 34949.0μs 6461.2μs 5.41x 7,569

Special Cases

Test Case Python Time Rust Time Speedup
Identical sequences (5,000 lines) 1773.1μs 406.1μs 4.37x
Completely different (1,000 lines) 284.5μs 219.8μs 1.29x

String Splitting Performance

Performance comparison of unified_diff_str vs unified_diff with Python splitlines():

File Size Python split + Rust diff All Rust (unified_diff_str) Speedup
100 lines 54.8μs 21.1μs 2.59x
500 lines 169.9μs 118.3μs 1.44x
1000 lines 316.1μs 248.3μs 1.27x
2000 lines 654.8μs 550.4μs 1.19x

API

def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'):
    """
    Compare two sequences of lines; generate the unified diff.
    
    Unified diffs are a compact way of showing line changes and a few
    lines of context. The number of context lines is set by n which
    defaults to three.
    
    Parameters:
        a: Sequence of lines to compare (the 'from' file)
        b: Sequence of lines to compare (the 'to' file)
        fromfile: Label to use for the 'from' file in the diff header
        tofile: Label to use for the 'to' file in the diff header
        fromfiledate: Modification date of the 'from' file
        tofiledate: Modification date of the 'to' file
        n: Number of context lines (default: 3)
        lineterm: Line terminator to use (default: '\n')
    
    Returns:
        Generator yielding unified diff format strings
    
    Note: This is a high-performance Rust implementation that provides
    3-5x speedup over Python's difflib while maintaining 100% compatibility.
    """
    pass

Development

# Activate virtual environment
source venv/bin/activate

# Run tests
python -m pytest tests/ -v

# Run benchmarks
python -m pytest tests/test_benchmark.py -s

# Build the package with optimizations
maturin develop --release

Contributing

If you want a feature or have an idea, just create a pull request! Contributions are welcome.

Author

Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.

About

Python's difflib implemented in rust.

Resources

License

Stars

Watchers

Forks

Packages

No packages published