⚡️ Speed up function `get_low_resolution_logit` by 7% #113

codeflash-ai · 2025-11-04T01:54:38Z

📄 7% (0.07x) speedup for `get_low_resolution_logit` in `src/transformers/models/mra/modeling_mra.py`

⏱️ Runtime : 4.47 milliseconds → 4.16 milliseconds (best of 199 runs)

📝 Explanation and details

The optimized code achieves a 7% speedup through several key tensor operation optimizations:

What was optimized:

Reduced repeated reshape operations: The original code called .reshape() multiple times on the same tensors. The optimized version precomputes query_reshaped, key_reshaped, and value_reshaped once and reuses them, eliminating redundant tensor reshaping overhead.
More efficient tensor creation: Replaced block_size * torch.ones(...) with torch.full(...), which directly creates the desired tensor values without an additional multiplication operation.
In-place operations: Used .div_() for the matmul scaling and .mul_() for mask penalty computation, reducing temporary tensor allocations.
Eliminated redundant computations: Cached the denominator token_count[:, :, None] + 1e-6 as denom to avoid computing it multiple times in the masked branch.

Why it's faster:

These optimizations reduce both computational overhead and memory allocations. Tensor reshaping in PyTorch involves memory layout operations that become expensive when repeated. The in-place operations avoid creating intermediate tensors, and caching frequently-used expressions eliminates redundant arithmetic.

Performance characteristics:

The optimizations are most effective for scenarios without masks (10-13% speedup) where the reshape savings are more significant. With masks, the gains are smaller (1-7%) since the mask processing dominates runtime. The improvements scale consistently across different tensor sizes, making this beneficial for both small attention blocks and large-scale transformer computations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 40 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import math

# imports
import pytest  # used for our unit tests
import torch
from transformers.models.mra.modeling_mra import get_low_resolution_logit

# unit tests

# ----------- BASIC TEST CASES -----------

def test_basic_no_mask_no_value():
    # Basic test: batch_size=1, seq_len=4, head_dim=2, block_size=2, no mask, no value
    query = torch.tensor([[[1., 2.], [3., 4.], [5., 6.], [7., 8.]]])
    key = torch.tensor([[[2., 1.], [4., 3.], [6., 5.], [8., 7.]]])
    block_size = 2

    # Should split into 2 blocks per row
    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 107μs -> 97.5μs (10.4% faster)

def test_basic_with_value():
    # Basic test with value tensor
    query = torch.randn(2, 6, 3)
    key = torch.randn(2, 6, 3)
    value = torch.randn(2, 6, 3)
    block_size = 3

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, value=value) # 108μs -> 97.4μs (11.8% faster)

def test_basic_with_mask():
    # Basic test with mask
    query = torch.ones(1, 4, 2)
    key = torch.ones(1, 4, 2)
    mask = torch.tensor([[1, 1, 0, 1]])  # Only 3 tokens active
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask) # 135μs -> 131μs (3.33% faster)

def test_basic_with_mask_and_value():
    # Basic test with mask and value
    query = torch.ones(1, 4, 2)
    key = torch.ones(1, 4, 2)
    value = torch.arange(8).reshape(1, 4, 2).float()
    mask = torch.tensor([[1, 1, 0, 1]])
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask, value=value) # 128μs -> 120μs (7.28% faster)

# ----------- EDGE TEST CASES -----------

def test_edge_block_size_equals_seq_len():
    # Edge: block_size == seq_len (only one block)
    query = torch.randn(1, 5, 4)
    key = torch.randn(1, 5, 4)
    block_size = 5

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 98.3μs -> 87.5μs (12.4% faster)

def test_edge_block_size_one():
    # Edge: block_size == 1 (each token is its own block)
    query = torch.randn(2, 4, 3)
    key = torch.randn(2, 4, 3)
    block_size = 1

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 100μs -> 90.2μs (11.2% faster)

def test_edge_mask_all_zeros():
    # Edge: mask is all zeros (no tokens active)
    query = torch.randn(1, 4, 2)
    key = torch.randn(1, 4, 2)
    mask = torch.zeros(1, 4)
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask) # 133μs -> 132μs (0.470% faster)

def test_edge_mask_all_ones():
    # Edge: mask is all ones (all tokens active)
    query = torch.randn(1, 4, 2)
    key = torch.randn(1, 4, 2)
    mask = torch.ones(1, 4)
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask) # 130μs -> 127μs (2.34% faster)

def test_edge_value_none_and_not_none():
    # Edge: value is None and not None, check outputs
    query = torch.randn(1, 6, 3)
    key = torch.randn(1, 6, 3)
    block_size = 2

    # value None
    logit1, _, _, value_hat1 = get_low_resolution_logit(query, key, block_size) # 101μs -> 92.0μs (10.7% faster)

    # value not None
    value = torch.randn(1, 6, 3)
    logit2, _, _, value_hat2 = get_low_resolution_logit(query, key, block_size, value=value) # 41.0μs -> 36.0μs (13.8% faster)

def test_edge_batch_size_greater_than_one():
    # Edge: batch_size > 1
    query = torch.ones(3, 4, 2)
    key = torch.ones(3, 4, 2)
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 92.2μs -> 82.8μs (11.4% faster)

def test_edge_seq_len_not_divisible_by_block_size():
    # Edge: seq_len not divisible by block_size (should raise error)
    query = torch.ones(1, 5, 2)
    key = torch.ones(1, 5, 2)
    block_size = 2
    # Should raise error due to reshape
    with pytest.raises(RuntimeError):
        get_low_resolution_logit(query, key, block_size) # 89.0μs -> 65.7μs (35.4% faster)

def test_edge_block_size_zero():
    # Edge: block_size == 0 (should raise error)
    query = torch.ones(1, 4, 2)
    key = torch.ones(1, 4, 2)
    block_size = 0
    with pytest.raises(ZeroDivisionError):
        get_low_resolution_logit(query, key, block_size) # 3.00μs -> 2.73μs (9.51% faster)

def test_edge_head_dim_one():
    # Edge: head_dim == 1
    query = torch.ones(1, 4, 1)
    key = torch.ones(1, 4, 1)
    block_size = 2

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 102μs -> 91.0μs (12.2% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_scale_batch_and_seq_len():
    # Large scale: batch_size=8, seq_len=128, head_dim=32, block_size=16
    batch_size = 8
    seq_len = 128
    head_dim = 32
    block_size = 16

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    value = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len)).float()

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask, value=value) # 163μs -> 157μs (4.15% faster)

    # Output shapes
    num_block_per_row = seq_len // block_size

def test_large_scale_all_ones_mask():
    # Large scale: all mask ones
    batch_size = 4
    seq_len = 64
    head_dim = 16
    block_size = 8

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    value = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.ones(batch_size, seq_len)

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask, value=value) # 155μs -> 150μs (3.33% faster)

def test_large_scale_all_zeros_mask():
    # Large scale: all mask zeros
    batch_size = 2
    seq_len = 32
    head_dim = 8
    block_size = 4

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.zeros(batch_size, seq_len)

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask) # 143μs -> 138μs (3.52% faster)

def test_large_scale_random_mask():
    # Large scale: random mask
    batch_size = 3
    seq_len = 96
    head_dim = 12
    block_size = 8

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len)).float()

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask) # 137μs -> 133μs (2.98% faster)

def test_large_scale_value_none():
    # Large scale: value is None
    batch_size = 5
    seq_len = 80
    head_dim = 10
    block_size = 8

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size) # 114μs -> 104μs (9.35% faster)

def test_large_scale_value_not_none():
    # Large scale: value is not None
    batch_size = 5
    seq_len = 80
    head_dim = 10
    block_size = 8

    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    value = torch.randn(batch_size, seq_len, head_dim)
    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, value=value) # 121μs -> 110μs (10.7% faster)

def test_large_scale_different_device():
    # Large scale: tensors on CUDA if available
    if torch.cuda.is_available():
        batch_size = 2
        seq_len = 64
        head_dim = 16
        block_size = 8

        query = torch.randn(batch_size, seq_len, head_dim, device='cuda')
        key = torch.randn(batch_size, seq_len, head_dim, device='cuda')
        value = torch.randn(batch_size, seq_len, head_dim, device='cuda')
        mask = torch.randint(0, 2, (batch_size, seq_len), device='cuda').float()

        logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask, value=value)

def test_large_scale_maximum_tensor_size():
    # Large scale: maximum allowed tensor size (under 100MB)
    batch_size = 1
    seq_len = 1000
    head_dim = 10
    block_size = 10

    # Each tensor: 1*1000*10*4 bytes = 40,000 bytes = 0.04MB
    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    value = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len)).float()

    logit, token_count, row_max, value_hat = get_low_resolution_logit(query, key, block_size, mask=mask, value=value) # 195μs -> 191μs (1.94% faster)

    num_block_per_row = seq_len // block_size
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math

# imports
import pytest  # used for our unit tests
import torch
from transformers.models.mra.modeling_mra import get_low_resolution_logit

# unit tests

# ----------- Basic Test Cases ------------

def test_basic_no_mask_no_value():
    # Test with small tensors, no mask, no value
    batch_size, seq_len, head_dim, block_size = 2, 4, 3, 2
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    # Should split into 2 blocks per row
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 98.4μs -> 90.7μs (8.50% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_basic_with_mask():
    # Test with mask, no value
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    query = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    key = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    mask = torch.tensor([[1, 1, 0, 1]], dtype=torch.float)  # 2 tokens in first block, 1 in second
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask); result = codeflash_output # 127μs -> 125μs (1.58% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_basic_with_value():
    # Test with value tensor
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    value = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, value=value); result = codeflash_output # 94.8μs -> 85.2μs (11.2% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result
    # Value_hat should be block means
    expected = value.reshape(batch_size, 2, block_size, head_dim).mean(dim=-2)

def test_basic_with_mask_and_value():
    # Test with both mask and value
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    value = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    mask = torch.tensor([[1, 1, 0, 1]], dtype=torch.float)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 135μs -> 127μs (5.71% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result
    # Value_hat should be masked block means
    expected = value.reshape(batch_size, 2, block_size, head_dim)
    token_count = mask.reshape(batch_size, 2, block_size).sum(dim=-1)
    expected = expected.sum(dim=-2) / (token_count[:, :, None] + 1e-6)

# ----------- Edge Test Cases ------------

def test_edge_seq_len_not_divisible_by_block_size():
    # seq_len not divisible by block_size should raise an error
    batch_size, seq_len, head_dim, block_size = 1, 5, 2, 2
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    with pytest.raises(RuntimeError):
        # Reshape will fail
        get_low_resolution_logit(query, key, block_size) # 89.0μs -> 65.5μs (35.9% faster)

def test_edge_block_size_1():
    # block_size = 1, should work and be equivalent to input
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 1
    query = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    key = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 97.7μs -> 87.2μs (12.1% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_block_size_equals_seq_len():
    # block_size == seq_len, only one block
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 4
    query = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    key = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 88.2μs -> 78.3μs (12.7% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_zero_mask():
    # Mask is all zeros, token_count should be zero, division by 1e-6
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    mask = torch.zeros(batch_size, seq_len)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask); result = codeflash_output # 134μs -> 132μs (1.55% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_mask_half_zero_half_one():
    # Mask is half zeros half ones
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    query = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    key = torch.arange(8, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    mask = torch.tensor([[1, 0, 1, 0]], dtype=torch.float)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask); result = codeflash_output # 121μs -> 121μs (0.582% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_head_dim_1():
    # head_dim = 1
    batch_size, seq_len, head_dim, block_size = 1, 4, 1, 2
    query = torch.arange(4, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    key = torch.arange(4, dtype=torch.float).reshape(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 88.5μs -> 78.1μs (13.4% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_batch_size_1():
    # batch_size = 1
    batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 94.0μs -> 86.6μs (8.62% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_large_block_size():
    # block_size very large (but <= seq_len)
    batch_size, seq_len, head_dim, block_size = 1, 16, 2, 8
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size); result = codeflash_output # 93.5μs -> 85.5μs (9.40% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_edge_mask_on_gpu():
    # Test that function works with CUDA tensors if available
    if torch.cuda.is_available():
        batch_size, seq_len, head_dim, block_size = 1, 4, 2, 2
        query = torch.ones(batch_size, seq_len, head_dim, device='cuda')
        key = torch.ones(batch_size, seq_len, head_dim, device='cuda')
        mask = torch.ones(batch_size, seq_len, device='cuda')
        codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask); result = codeflash_output
        low_res_logit, token_count, logit_row_max, value_hat = result

# ----------- Large Scale Test Cases ------------

def test_large_scale_batch():
    # Large batch size, but small enough for <100MB
    batch_size, seq_len, head_dim, block_size = 50, 20, 8, 4
    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len), dtype=torch.float)
    value = torch.randn(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 188μs -> 182μs (3.38% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result
    # Check shapes
    num_blocks = seq_len // block_size

def test_large_scale_seq_len():
    # Large seq_len, but <100MB
    batch_size, seq_len, head_dim, block_size = 2, 512, 4, 64
    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len), dtype=torch.float)
    value = torch.randn(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 147μs -> 139μs (5.65% faster)
    low_res_logit, token_count, logit_row_max, value_hat = result
    num_blocks = seq_len // block_size

def test_large_scale_head_dim():
    # Large head_dim, but <100MB
    batch_size, seq_len, head_dim, block_size = 2, 32, 128, 8
    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len), dtype=torch.float)
    value = torch.randn(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 152μs -> 141μs (7.89% faster)
    num_blocks = seq_len // block_size
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_large_scale_all_ones():
    # Large tensor, all ones, check that output is correct and finite
    batch_size, seq_len, head_dim, block_size = 4, 128, 16, 16
    query = torch.ones(batch_size, seq_len, head_dim)
    key = torch.ones(batch_size, seq_len, head_dim)
    mask = torch.ones(batch_size, seq_len)
    value = torch.ones(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 153μs -> 147μs (4.26% faster)
    num_blocks = seq_len // block_size
    low_res_logit, token_count, logit_row_max, value_hat = result

def test_large_scale_random_mask():
    # Random mask, check that output is correct and finite
    batch_size, seq_len, head_dim, block_size = 8, 64, 8, 8
    query = torch.randn(batch_size, seq_len, head_dim)
    key = torch.randn(batch_size, seq_len, head_dim)
    mask = torch.randint(0, 2, (batch_size, seq_len), dtype=torch.float)
    value = torch.randn(batch_size, seq_len, head_dim)
    codeflash_output = get_low_resolution_logit(query, key, block_size, mask=mask, value=value); result = codeflash_output # 158μs -> 148μs (6.71% faster)
    num_blocks = seq_len // block_size
    low_res_logit, token_count, logit_row_max, value_hat = result
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_low_resolution_logit-mhjx1omy and push.

The optimized code achieves a 7% speedup through several key tensor operation optimizations: **What was optimized:** 1. **Reduced repeated reshape operations**: The original code called `.reshape()` multiple times on the same tensors. The optimized version precomputes `query_reshaped`, `key_reshaped`, and `value_reshaped` once and reuses them, eliminating redundant tensor reshaping overhead. 2. **More efficient tensor creation**: Replaced `block_size * torch.ones(...)` with `torch.full(...)`, which directly creates the desired tensor values without an additional multiplication operation. 3. **In-place operations**: Used `.div_()` for the matmul scaling and `.mul_()` for mask penalty computation, reducing temporary tensor allocations. 4. **Eliminated redundant computations**: Cached the denominator `token_count[:, :, None] + 1e-6` as `denom` to avoid computing it multiple times in the masked branch. **Why it's faster:** These optimizations reduce both computational overhead and memory allocations. Tensor reshaping in PyTorch involves memory layout operations that become expensive when repeated. The in-place operations avoid creating intermediate tensors, and caching frequently-used expressions eliminates redundant arithmetic. **Performance characteristics:** The optimizations are most effective for scenarios without masks (10-13% speedup) where the reshape savings are more significant. With masks, the gains are smaller (1-7%) since the mask processing dominates runtime. The improvements scale consistently across different tensor sizes, making this beneficial for both small attention blocks and large-scale transformer computations.

codeflash-ai bot requested a review from mashraf-222 November 4, 2025 01:54

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `get_low_resolution_logit` by 7% #113

⚡️ Speed up function `get_low_resolution_logit` by 7% #113

Uh oh!

codeflash-ai bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function get_low_resolution_logit by 7% #113

Are you sure you want to change the base?

⚡️ Speed up function get_low_resolution_logit by 7% #113

Uh oh!

Conversation

codeflash-ai bot commented Nov 4, 2025

📄 7% (0.07x) speedup for get_low_resolution_logit in src/transformers/models/mra/modeling_mra.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `get_low_resolution_logit` by 7% #113

⚡️ Speed up function `get_low_resolution_logit` by 7% #113

📄 7% (0.07x) speedup for `get_low_resolution_logit` in `src/transformers/models/mra/modeling_mra.py`