A high-performance Rust implementation of Python's difflib.unified_diff
function with PyO3 bindings.
This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib
module while maintaining API compatibility.
- 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
- 100% Compatible: Drop-in replacement for
difflib.unified_diff
with identical output - Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
- Easy to use: Simple Python API with PyO3 bindings
pip install difflib-rs
# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs
# Set up virtual environment
python -m venv venv
source venv/bin/activate
# Install build dependencies
pip install maturin pytest
# Build and install
maturin develop --release
This is a drop-in replacement for Python's difflib.unified_diff
. Simply replace your import:
- from difflib import unified_diff
+ from difflib_rs import unified_diff
# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']
diff = unified_diff(
a, b,
fromfile='original.txt',
tofile='modified.txt',
fromfiledate='2023-01-01',
tofiledate='2023-01-02'
)
for line in diff:
print(line, end='')
Note: Currently, only unified_diff
is supported. Other difflib
functions are not implemented, but pull requests are welcome!
Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py
for reference.
For additional convenience, use unified_diff_str
directly with (unsplit) strings:
from difflib_rs import unified_diff_str
# Compare two strings directly - no need to split first!
text_a = """line1
line2
line3"""
text_b = """line1
modified
line3"""
# The function handles splitting internally (more efficient)
diff = unified_diff_str(
text_a, text_b,
fromfile='original.txt',
tofile='modified.txt',
keepends=False # Whether to keep line endings in the diff
)
for line in diff:
print(line, end='')
The unified_diff_str
function:
- Takes strings directly instead of lists
- Handles line splitting internally in Rust (faster than Python's
splitlines()
) - Supports
\n
,\r\n
, and\r
line endings - Has a
keepends
parameter to preserve line endings in the output
The Rust implementation consistently outperforms Python's built-in difflib
module while producing identical output:
File Size | Python Time | Rust Time | Speedup | Output Lines |
---|---|---|---|---|
100 lines | 86.0μs | 38.3μs | 2.24x | 71 |
500 lines | 450.6μs | 130.3μs | 3.46x | 300 |
1,000 lines | 910.2μs | 220.8μs | 4.12x | 587 |
2,000 lines | 2203.1μs | 482.3μs | 4.57x | 1,222 |
File Size | Python Time | Rust Time | Speedup | Output Lines |
---|---|---|---|---|
100 lines | 167.9μs | 49.3μs | 3.41x | 131 |
500 lines | 1028.5μs | 252.0μs | 4.08x | 655 |
1,000 lines | 1925.0μs | 414.3μs | 4.65x | 1,285 |
File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
---|---|---|---|---|---|
5,000 lines | 5 | 2842.0μs | 859.7μs | 3.31x | 47 |
10,000 lines | 5 | 5003.2μs | 1471.3μs | 3.40x | 47 |
20,000 lines | 5 | 8470.5μs | 2821.6μs | 3.00x | 47 |
File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
---|---|---|---|---|---|
5,000 lines | 250 | 7985.5μs | 1579.4μs | 5.06x | 1,869 |
10,000 lines | 500 | 14692.5μs | 2833.8μs | 5.18x | 3,793 |
20,000 lines | 1,000 | 34949.0μs | 6461.2μs | 5.41x | 7,569 |
Test Case | Python Time | Rust Time | Speedup |
---|---|---|---|
Identical sequences (5,000 lines) | 1773.1μs | 406.1μs | 4.37x |
Completely different (1,000 lines) | 284.5μs | 219.8μs | 1.29x |
Performance comparison of unified_diff_str
vs unified_diff
with Python splitlines()
:
File Size | Python split + Rust diff | All Rust (unified_diff_str ) |
Speedup |
---|---|---|---|
100 lines | 54.8μs | 21.1μs | 2.59x |
500 lines | 169.9μs | 118.3μs | 1.44x |
1000 lines | 316.1μs | 248.3μs | 1.27x |
2000 lines | 654.8μs | 550.4μs | 1.19x |
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'):
"""
Compare two sequences of lines; generate the unified diff.
Unified diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by n which
defaults to three.
Parameters:
a: Sequence of lines to compare (the 'from' file)
b: Sequence of lines to compare (the 'to' file)
fromfile: Label to use for the 'from' file in the diff header
tofile: Label to use for the 'to' file in the diff header
fromfiledate: Modification date of the 'from' file
tofiledate: Modification date of the 'to' file
n: Number of context lines (default: 3)
lineterm: Line terminator to use (default: '\n')
Returns:
Generator yielding unified diff format strings
Note: This is a high-performance Rust implementation that provides
3-5x speedup over Python's difflib while maintaining 100% compatibility.
"""
pass
# Activate virtual environment
source venv/bin/activate
# Run tests
python -m pytest tests/ -v
# Run benchmarks
python -m pytest tests/test_benchmark.py -s
# Build the package with optimizations
maturin develop --release
If you want a feature or have an idea, just create a pull request! Contributions are welcome.
Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.