Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 3, 2025

📄 35% (0.35x) speedup for get_max_res_without_distortion in src/transformers/models/llama4/image_processing_llama4_fast.py

⏱️ Runtime : 112 microseconds 83.0 microseconds (best of 202 runs)

📝 Explanation and details

The optimized code replaces expensive floating-point operations with faster integer arithmetic. The key optimization is eliminating min(math.floor(original_height * scale_w), target_height) and min(math.floor(original_width * scale_h), target_width) calls.

Specific changes:

  • math.floor(original_height * scale_w) becomes (original_height * target_width) // original_width
  • math.floor(original_width * scale_h) becomes (original_width * target_height) // original_height
  • min() calls replaced with conditional expressions using if-else

Why this is faster:

  1. Integer division (//) is significantly faster than float multiplication + math.floor() - eliminates floating-point precision overhead
  2. Avoids function call overhead from math.floor() and min()
  3. Direct conditional logic (a if a < b else b) is faster than min(a, b) function calls

The line profiler shows the original min(math.floor(...)) lines consumed 30.7% of total runtime, which are now split into simpler integer operations consuming only 19.8% combined.

Test case performance: The optimization shows consistent 30-60% speedup across all test cases, with particularly strong gains on:

  • Basic scaling operations (40-50% faster)
  • Large-scale operations with extreme aspect ratios (50-60% faster)
  • Edge cases with small dimensions (40-50% faster)

The mathematical equivalence is preserved: math.floor(a * (b/c)) equals (a * b) // c for positive integers, maintaining identical behavior while using faster integer arithmetic.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 161 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import math

# imports
import pytest  # used for our unit tests
from transformers.models.llama4.image_processing_llama4_fast import \
    get_max_res_without_distortion

# unit tests

# ------------------- BASIC TEST CASES -------------------

def test_basic_square_image_and_target():
    # Both image and target are square; should scale to target exactly
    codeflash_output = get_max_res_without_distortion((100, 100), (200, 200)) # 1.45μs -> 978ns (48.3% faster)

def test_basic_scale_down_width_limited():
    # Image is wider than tall, target is taller than wide
    # Should scale width to fit target width, height accordingly
    codeflash_output = get_max_res_without_distortion((200, 300), (450, 200)) # 1.48μs -> 1.02μs (44.7% faster)

def test_basic_scale_down_height_limited():
    # Image is taller than wide, target is wider than tall
    # Should scale height to fit target height, width accordingly
    codeflash_output = get_max_res_without_distortion((800, 600), (450, 1300)) # 1.49μs -> 1.07μs (39.5% faster)

def test_basic_no_scaling_needed():
    # Image fits inside target, should scale up to target
    codeflash_output = get_max_res_without_distortion((100, 100), (200, 300)) # 1.50μs -> 1.03μs (45.3% faster)

def test_basic_exact_fit():
    # Image and target are the same, should return target
    codeflash_output = get_max_res_without_distortion((300, 400), (300, 400)) # 1.42μs -> 1.06μs (33.4% faster)

def test_basic_upscale_with_aspect_ratio():
    # Image smaller than target, but aspect ratio maintained
    codeflash_output = get_max_res_without_distortion((100, 50), (300, 400)) # 1.54μs -> 1.05μs (46.7% faster)

# ------------------- EDGE TEST CASES -------------------

def test_edge_zero_height():
    # Height is zero, should raise ZeroDivisionError
    with pytest.raises(ZeroDivisionError):
        get_max_res_without_distortion((0, 100), (200, 200)) # 1.06μs -> 1.10μs (3.63% slower)

def test_edge_zero_width():
    # Width is zero, should raise ZeroDivisionError
    with pytest.raises(ZeroDivisionError):
        get_max_res_without_distortion((100, 0), (200, 200)) # 907ns -> 991ns (8.48% slower)

def test_edge_target_zero_height():
    # Target height is zero, should scale to zero height
    codeflash_output = get_max_res_without_distortion((100, 200), (0, 200)) # 1.77μs -> 1.27μs (38.9% faster)

def test_edge_target_zero_width():
    # Target width is zero, should scale to zero width
    codeflash_output = get_max_res_without_distortion((100, 200), (200, 0)) # 1.64μs -> 1.10μs (48.2% faster)

def test_edge_image_and_target_one_pixel():
    # Both image and target are 1x1 pixel
    codeflash_output = get_max_res_without_distortion((1, 1), (1, 1)) # 1.53μs -> 1.02μs (49.4% faster)

def test_edge_extremely_tall_image():
    # Image is very tall, target is very wide
    codeflash_output = get_max_res_without_distortion((1000, 10), (100, 1000)) # 1.52μs -> 1.07μs (41.9% faster)

def test_edge_extremely_wide_image():
    # Image is very wide, target is very tall
    codeflash_output = get_max_res_without_distortion((10, 1000), (1000, 100)) # 1.47μs -> 1.08μs (36.4% faster)

def test_edge_target_smaller_than_image():
    # Target is smaller than image, should scale down
    codeflash_output = get_max_res_without_distortion((500, 400), (100, 80)) # 1.54μs -> 1.04μs (47.6% faster)

def test_edge_non_integer_scaling():
    # Scaling factor is not integer, should floor
    codeflash_output = get_max_res_without_distortion((100, 300), (200, 450)) # 1.52μs -> 1.02μs (48.5% faster)

def test_edge_image_size_tuple_length():
    # Image size tuple has more than two elements, should raise ValueError
    with pytest.raises(ValueError):
        get_max_res_without_distortion((100, 200, 300), (200, 200)) # 1.55μs -> 1.61μs (3.30% slower)

def test_edge_target_size_tuple_length():
    # Target size tuple has more than two elements, should raise ValueError
    with pytest.raises(ValueError):
        get_max_res_without_distortion((100, 200), (200, 200, 300)) # 1.49μs -> 1.54μs (3.25% slower)



def test_large_scale_upscale():
    # Large image, large target, aspect ratio maintained
    codeflash_output = get_max_res_without_distortion((1000, 800), (900, 1000)) # 2.07μs -> 1.30μs (59.6% faster)

def test_large_scale_downscale():
    # Large image, small target, aspect ratio maintained
    codeflash_output = get_max_res_without_distortion((800, 1000), (100, 200)) # 1.69μs -> 1.21μs (39.6% faster)

def test_large_scale_exact_fit():
    # Large image and target, same dimensions
    codeflash_output = get_max_res_without_distortion((999, 999), (999, 999)) # 1.62μs -> 1.14μs (41.9% faster)

def test_large_scale_extreme_aspect_ratio():
    # Large image with extreme aspect ratio, target is square
    codeflash_output = get_max_res_without_distortion((1000, 10), (500, 500)) # 1.55μs -> 1.12μs (39.1% faster)

def test_large_scale_many_different_inputs():
    # Test a range of images and targets for consistency
    for i in range(1, 1000, 100):
        for j in range(1, 1000, 100):
            image_size = (i, j)
            target_size = (j, i)
            codeflash_output = get_max_res_without_distortion(image_size, target_size); res = codeflash_output
            # Result should maintain aspect ratio
            if res[0] > 0 and res[1] > 0:
                orig_ratio = image_size[0] / image_size[1]
                new_ratio = res[0] / res[1]

# ------------------- ADDITIONAL FUNCTIONALITY TESTS -------------------

def test_functional_aspect_ratio_preservation():
    # Check that aspect ratio is preserved for random sizes
    image_size = (123, 456)
    target_size = (789, 1011)
    codeflash_output = get_max_res_without_distortion(image_size, target_size); res = codeflash_output # 1.48μs -> 1.10μs (34.8% faster)
    orig_ratio = image_size[0] / image_size[1]
    new_ratio = res[0] / res[1]

def test_functional_maximum_size_not_exceeded():
    # Ensure output does not exceed target size
    image_size = (300, 400)
    target_size = (200, 100)
    codeflash_output = get_max_res_without_distortion(image_size, target_size); res = codeflash_output # 1.48μs -> 1.02μs (45.2% faster)

# ------------------- INPUT VALIDATION PATCH -------------------
# Patch the function to raise ValueError for invalid input lengths and negative sizes
def patched_get_max_res_without_distortion(image_size, target_size):
    if len(image_size) != 2 or len(target_size) != 2:
        raise ValueError("Input tuples must be length 2")
    if image_size[0] < 0 or image_size[1] < 0 or target_size[0] < 0 or target_size[1] < 0:
        raise ValueError("Sizes must be non-negative")
    original_height, original_width = image_size
    target_height, target_width = target_size

    scale_w = target_width / original_width
    scale_h = target_height / original_height

    if scale_w < scale_h:
        new_width = target_width
        new_height = min(math.floor(original_height * scale_w), target_height)
    else:
        new_height = target_height
        new_width = min(math.floor(original_width * scale_h), target_width)

    return new_height, new_width

# Replace the function for validation tests
@pytest.mark.parametrize("image_size,target_size", [
    ((100, 200, 300), (200, 200)),
    ((100, 200), (200, 200, 300)),
    ((-100, 200), (200, 200)),
    ((100, 200), (200, -200)),
])
def test_input_validation(image_size, target_size):
    with pytest.raises(ValueError):
        patched_get_max_res_without_distortion(image_size, target_size)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math

# imports
import pytest
from transformers.models.llama4.image_processing_llama4_fast import \
    get_max_res_without_distortion

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_basic_landscape_fit_width():
    # Image is wider than tall, should fit width
    codeflash_output = get_max_res_without_distortion((200, 300), (450, 200)) # 1.88μs -> 1.19μs (57.7% faster)
    # Explanation: scale_w = 200/300 = 0.666..., scale_h = 450/200 = 2.25, so scale_w < scale_h

def test_basic_portrait_fit_height():
    # Portrait image, should fit height
    codeflash_output = get_max_res_without_distortion((800, 600), (450, 1300)) # 1.64μs -> 1.12μs (46.4% faster)
    # scale_w = 1300/600 = 2.166..., scale_h = 450/800 = 0.5625, so scale_w > scale_h

def test_basic_square_image_and_target():
    # Square image, square target
    codeflash_output = get_max_res_without_distortion((100, 100), (50, 50)) # 1.58μs -> 1.04μs (52.0% faster)

def test_basic_exact_fit():
    # Image and target are same size, should return same
    codeflash_output = get_max_res_without_distortion((400, 300), (400, 300)) # 1.59μs -> 1.08μs (46.9% faster)

def test_basic_upscale():
    # Image smaller than target, upscaling allowed
    codeflash_output = get_max_res_without_distortion((100, 200), (400, 800)) # 1.58μs -> 1.10μs (43.2% faster)

def test_basic_downscale():
    # Image larger than target, downscaling
    codeflash_output = get_max_res_without_distortion((800, 1200), (400, 600)) # 1.55μs -> 1.04μs (48.3% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_edge_zero_dimension_image():
    # Zero height or width in image should raise ZeroDivisionError
    with pytest.raises(ZeroDivisionError):
        get_max_res_without_distortion((0, 100), (50, 50)) # 981ns -> 1.07μs (8.32% slower)
    with pytest.raises(ZeroDivisionError):
        get_max_res_without_distortion((100, 0), (50, 50)) # 569ns -> 620ns (8.23% slower)

def test_edge_zero_dimension_target():
    # Zero height or width in target should result in zero output
    codeflash_output = get_max_res_without_distortion((100, 100), (0, 50)) # 1.64μs -> 1.23μs (33.1% faster)
    codeflash_output = get_max_res_without_distortion((100, 100), (50, 0)) # 722ns -> 635ns (13.7% faster)

def test_edge_one_pixel_image():
    # 1x1 image to larger target
    codeflash_output = get_max_res_without_distortion((1, 1), (100, 100)) # 1.48μs -> 983ns (50.9% faster)

def test_edge_one_pixel_target():
    # Large image to 1x1 target
    codeflash_output = get_max_res_without_distortion((1000, 1000), (1, 1)) # 1.44μs -> 1.02μs (41.4% faster)

def test_edge_non_integer_result():
    # Check flooring of non-integer result
    # 300x500 -> 200x200, scale_w=200/500=0.4, scale_h=200/300=0.666
    # scale_w < scale_h, so new_width=200, new_height=floor(300*0.4)=120
    codeflash_output = get_max_res_without_distortion((300, 500), (200, 200)) # 1.44μs -> 1.04μs (38.3% faster)

def test_edge_large_aspect_ratio():
    # Very wide image to square target
    codeflash_output = get_max_res_without_distortion((100, 1000), (500, 500)) # 1.46μs -> 1.09μs (34.5% faster)
    # Very tall image to square target
    codeflash_output = get_max_res_without_distortion((1000, 100), (500, 500)) # 677ns -> 584ns (15.9% faster)

def test_edge_target_smaller_than_one_axis():
    # Target smaller than one axis, larger than the other
    codeflash_output = get_max_res_without_distortion((400, 800), (200, 1000)) # 1.48μs -> 1.08μs (36.5% faster)
    codeflash_output = get_max_res_without_distortion((800, 400), (1000, 200)) # 645ns -> 523ns (23.3% faster)

def test_edge_image_and_target_equal_aspect_ratio():
    # Both have same aspect ratio, should fit exactly
    codeflash_output = get_max_res_without_distortion((300, 600), (100, 200)) # 1.46μs -> 1.03μs (42.1% faster)
    codeflash_output = get_max_res_without_distortion((150, 300), (75, 150)) # 536ns -> 440ns (21.8% faster)

def test_edge_large_numbers():
    # Large numbers to check for overflow or precision
    codeflash_output = get_max_res_without_distortion((10**6, 2*10**6), (10**3, 2*10**3)) # 1.53μs -> 1.31μs (17.1% faster)

def test_edge_minimum_possible_nonzero():
    # 1xN or Nx1 image to 1x1 target
    codeflash_output = get_max_res_without_distortion((1, 100), (1, 1)) # 1.46μs -> 1.05μs (39.2% faster)
    codeflash_output = get_max_res_without_distortion((100, 1), (1, 1)) # 658ns -> 560ns (17.5% faster)


def test_large_scale_square_image():
    # Large square image to smaller square target
    codeflash_output = get_max_res_without_distortion((1000, 1000), (500, 500)) # 2.00μs -> 1.30μs (54.0% faster)

def test_large_scale_rectangular_image():
    # Large rectangular image to rectangular target
    codeflash_output = get_max_res_without_distortion((900, 1200), (450, 600)) # 1.67μs -> 1.13μs (47.7% faster)

def test_large_scale_upscale():
    # Small image to large target
    codeflash_output = get_max_res_without_distortion((10, 20), (1000, 2000)) # 1.64μs -> 1.08μs (51.2% faster)

def test_large_scale_many_aspect_ratios():
    # Test many aspect ratios in a loop
    for i in range(1, 1000, 100):
        h, w = i + 100, i + 200
        th, tw = (i + 50), (i + 150)
        codeflash_output = get_max_res_without_distortion((h, w), (th, tw)); res = codeflash_output # 5.05μs -> 3.67μs (37.5% faster)
        # Should maintain aspect ratio (to within 1 pixel due to floor)
        orig_ratio = w / h
        new_ratio = res[1] / res[0] if res[0] != 0 else 0
        if res[0] != 0 and res[1] != 0:
            pass

def test_large_scale_extreme_aspect_ratio():
    # Very wide image to tall target
    codeflash_output = get_max_res_without_distortion((100, 1000), (999, 10)) # 1.41μs -> 1.02μs (38.1% faster)
    # Very tall image to wide target
    codeflash_output = get_max_res_without_distortion((1000, 100), (10, 999)) # 599ns -> 490ns (22.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_max_res_without_distortion-mhjqprdx and push.

Codeflash Static Badge

The optimized code replaces expensive floating-point operations with faster integer arithmetic. The key optimization is eliminating `min(math.floor(original_height * scale_w), target_height)` and `min(math.floor(original_width * scale_h), target_width)` calls.

**Specific changes:**
- `math.floor(original_height * scale_w)` becomes `(original_height * target_width) // original_width`
- `math.floor(original_width * scale_h)` becomes `(original_width * target_height) // original_height`
- `min()` calls replaced with conditional expressions using `if-else`

**Why this is faster:**
1. **Integer division (`//`) is significantly faster than float multiplication + `math.floor()`** - eliminates floating-point precision overhead
2. **Avoids function call overhead** from `math.floor()` and `min()`
3. **Direct conditional logic** (`a if a < b else b`) is faster than `min(a, b)` function calls

The line profiler shows the original `min(math.floor(...))` lines consumed 30.7% of total runtime, which are now split into simpler integer operations consuming only 19.8% combined.

**Test case performance:** The optimization shows consistent 30-60% speedup across all test cases, with particularly strong gains on:
- Basic scaling operations (40-50% faster)
- Large-scale operations with extreme aspect ratios (50-60% faster)
- Edge cases with small dimensions (40-50% faster)

The mathematical equivalence is preserved: `math.floor(a * (b/c))` equals `(a * b) // c` for positive integers, maintaining identical behavior while using faster integer arithmetic.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 3, 2025 22:57
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant