Skip to content

LLVM failed to optimize out the empty loop for _ in 0 .. 100 {} #41097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kennytm opened this issue Apr 5, 2017 · 10 comments
Closed

LLVM failed to optimize out the empty loop for _ in 0 .. 100 {} #41097

kennytm opened this issue Apr 5, 2017 · 10 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug.

Comments

@kennytm
Copy link
Member

kennytm commented Apr 5, 2017

Test case:

fn main() {
    for _ in 0 .. 100 {}
}

Run with:

$ rustc -O -C remark=all --emit asm --emit llvm-ir 1.rs

Expected: The loop is elided
Actual result: The loop still remains, in both LLVM-IR and ASM output.

The loop is elided when the upper limit is 99. -C opt-level=2 and 3 make no difference.


Compiler output, showing LLVM failed to vectorize the loop because it could not determine number of loop iterations:

warning: -C remark will not show source locations without --debuginfo

note: optimization analysis for inline at [unknown]: _ZN38_$LT$i32$u20$as$u20$core..ops..Add$GT$3add17ha3e8e1701e046ad4E can be inlined into _ZN47_$LT$i32$u20$as$u20$core..iter..range..Step$GT$7add_one17hea1d4412e959396dE with cost=-15000 (threshold=487)

note: optimization remark for inline at [unknown]: _ZN38_$LT$i32$u20$as$u20$core..ops..Add$GT$3add17ha3e8e1701e046ad4E inlined into _ZN47_$LT$i32$u20$as$u20$core..iter..range..Step$GT$7add_one17hea1d4412e959396dE

note: optimization analysis for inline at [unknown]: _ZN4core3cmp5impls55_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$i32$GT$2lt17he1f17f7a171928f2E can be inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E with cost=-14995 (threshold=487)

note: optimization remark for inline at [unknown]: _ZN4core3cmp5impls55_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$i32$GT$2lt17he1f17f7a171928f2E inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E

note: optimization analysis for inline at [unknown]: _ZN4core3mem4swap17h6aaf5756849aed58E can be inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E with cost=-15000 (threshold=487)

note: optimization remark for inline at [unknown]: _ZN4core3mem4swap17h6aaf5756849aed58E inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E

note: optimization analysis for inline at [unknown]: _ZN47_$LT$i32$u20$as$u20$core..iter..range..Step$GT$7add_one17hea1d4412e959396dE can be inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E with cost=-14995 (threshold=487)

note: optimization remark for inline at [unknown]: _ZN47_$LT$i32$u20$as$u20$core..iter..range..Step$GT$7add_one17hea1d4412e959396dE inlined into _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E

note: optimization analysis for inline at [unknown]: _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E can be inlined into _ZN2_14main17h6f9533fc3f344a87E with cost=-14980 (threshold=325)

note: optimization remark for inline at [unknown]: _ZN4core4iter5range86_$LT$impl$u20$core..iter..iterator..Iterator$u20$for$u20$core..ops..Range$LT$A$GT$$GT$4next17h59cb9f32d4a0d028E inlined into _ZN2_14main17h6f9533fc3f344a87E

note: optimization remark for tailcallelim at [unknown]: marked this call a tail call candidate

note: optimization analysis for loop-vectorize at [unknown]: loop not vectorized: could not determine number of loop iterations

note: optimization missed for loop-vectorize at [unknown]: loop not vectorized: use -Rpass-analysis=loop-vectorize for more info

Versions:

$ rustc -Vv
rustc 1.18.0-nightly (2564711e8 2017-04-04)
binary: rustc
commit-hash: 2564711e803f62e04bebf10408cc1c11297c0caf
commit-date: 2017-04-04
host: x86_64-apple-darwin
release: 1.18.0-nightly
LLVM version: 3.9

(See discussion on http://stackoverflow.com/a/43234445/224671 for details)

@kennytm
Copy link
Member Author

kennytm commented Apr 5, 2017

Interestingly if the type of the range is at least 64-bit on x86_64, the loop is also completely elided.

Loop-vectorization failed if the type is i8, u8, i16, u16, i32, u32. That is, the failure happens only when the integer size is smaller than the native word size.

// Completely fine:
fn main() {
    for _ in 0usize .. 100_000_000usize {}
}

@est31
Copy link
Member

est31 commented Apr 6, 2017

Probably the number 100 comes from the default value of the brute force parameter. Seems it can only recognize the loop iteration count if the type is >= the native word size, and falls back to brute force if its not.

This is fine as well:

#![feature(i128_type)]

fn main() {
    for i in 0..100_000_000_000_000_000u128 {}
}

@djzin
Copy link
Contributor

djzin commented Apr 8, 2017

Been playing around with this - this seems pretty bad and severely degrades my confidence in the ability of the compiler to optimize complex code! I tried desugaring the for loop to get an idea as to what's going on -- the following replicates the problem:

struct Range {
    start: u32,
    end: u32,
}

impl Iterator for Range {
    type Item = u32;
    
    fn next(&mut self) -> Option<u32> {
        if self.start < self.end {
            let n = self.start;
            self.start = self.start + 1;
            Some(n)
        } else {
            // self.start = self.start + 1 // uncomment me for speedup
            None
        }
    }
}

fn main() {
    let mut range = Range { start: 0, end: 100 };
    loop {
        match range.next() {
            Some(_) => {},
            None => break,
        }
    }
}

Changing the unused return type of next to () or usize causes the loop to be removed; as does uncommenting the indicated line. Returning a constant value of 0 from the next function does not remove the loop.

@arielb1 arielb1 added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 8, 2017
@djzin
Copy link
Contributor

djzin commented Apr 8, 2017

The following c code is optimized in a very similar way by LLVM via clang - so does look like an LLVM bug. Not that we can't work around it. For reference - this code optimizes down to nothing in gcc 6.3.0.

struct range {
    int start;
    int end;
};

struct option {
    int ret;
    int tag;
};

struct option next(struct range *range) {
    if (range->start < range->end) {
        int n = range->start;
        range->start++;
        struct option option = {n, 1};
        return option;
    } else {
        struct option option = {0, 0};
        return option;
    }
}

int main() {
    struct range range = {0, 100};
    for(;;) {
        struct option option = next(&range);
        if (option.tag) {
        } else {
            break;
        }
    }
    return 0;
}

@djzin
Copy link
Contributor

djzin commented Apr 9, 2017

Minimal example -

fn main() {
    let mut start = 0;
    let end = 10000;
    loop {
        let tag;
        if start < end {
            start += 1;
            tag = true;
        } else {
            tag = false;
        }
        if !tag { break; }
    }
}
int main() {
    int start = 0;
    int end = 10000;
    int tag = 0;
    do {
        tag = 0;
        if (start < end) start++, tag = 1;
    } while (tag);
    return 0;
}

@nateozem
Copy link

How about changing the range to start at 1

fn main() {
    for _ in 1 .. 100 {}
}

Below is the result where the loop is removed.

$ rustc main_start_1.rs -C opt-level=2 -o builds/main_start_1
 
$ objdump -x builds/main_start_1 | grep main
 builds/main_start_1:     file format mach-o-x86-64
 builds/main_start_1
 00000001000008b0 l       0e SECT   01 0000 [.text] __ZN12main_start_14main17hb1ec1b2167cb471eE
 00000001000008c0 g       0f SECT   01 0000 [.text] _main
 
$ objdump -D --start-address=0x1000008b0 builds/main_start_1  | awk '{print $0} $3~/retq?/{exit}'
 
 builds/main_start_1:     file format mach-o-x86-64
 
 
 Disassembly of section .text:
 
 00000001000008b0 <__ZN12main_start_14main17hb1ec1b2167cb471eE>:
    1000008b0:   55                      push   %rbp
    1000008b1:   48 89 e5                mov    %rsp,%rbp
    1000008b4:   5d                      pop    %rbp
    1000008b5:   c3                      retq
 
$ rustc -vV
 rustc 1.18.0-nightly (bbdaad0dc 2017-04-14)
 binary: rustc
 commit-hash: bbdaad0dc8dc64e036ccee817f90a91876b32a9d
 commit-date: 2017-04-14
 host: x86_64-apple-darwin
 release: 1.18.0-nightly
 LLVM version: 3.9

I wonder why the results differ if started at 1 than 0.

@est31
Copy link
Member

est31 commented Apr 18, 2017

@nateozem that's probably because then the loop count is below 100 (below scalar-evolution-max-iterations), and then the optimisation works.

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 27, 2017
@oyvindln
Copy link
Contributor

@djzin

Interestingly, in your example either changing the loop condition to != instead of <, or setting self.start to self.end or u32::max_value() optimizes out the loop when end is < than 147 rather than 100.

One possible workaround to this issue could be if for loops started using the new for_each trait, which could be specialized for ranges of primitive numbers to avoid the option value entirely.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 9, 2017

@djzin i've filled https://bugs.llvm.org/show_bug.cgi?id=34538 and giving you credit for the code and the mwe, you might want to subscribe to the issue

@alexcrichton
Copy link
Member

Fixed in #47828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

9 participants