Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: defunctionalize pass on the caller runtime to apply #7100

Merged
merged 9 commits into from
Jan 17, 2025

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Jan 17, 2025

Description

Problem*

Extracted from #7072

Summary*

Updates defunctionalize to create the apply function with the runtime inherited from its caller.

If it's called from both ACIR and Brillig then two functions are created. This is because at this point the functions have already been monomorphised, and by having two different versions we can apply different rules. Or at least it keeps the logic simpler: if we use --force-brillig then we don't want acir functions to pop up; if we don't, then I suppose there wouldn't be any harm calling acir from brillig as it would be inlined.

Another potential issue of not separating the runtime would be if for example an acir-only instruction was added to apply, and then it would be inlined into a brillig function, causing a panic later; although I'm not sure how this is avoided now (can't brillig functions still call ACIR ones and inline their stuff?).

TODO:

  • Add test with expected runtime
  • Fix the SSA parser to handle function IDs as values
  • Cherry pick 64841c4
  • Add test to show that two applies are created if there are two different callers

See this and this comment about the increase in size of certain tests.

Additional Context

The problem with not inheriting the runtime was that it interfered with inlining: any Brillig function called from ACIR was considered an entry point and wasn't inlined into anything. OTOH the ACIR apply was inlined into its Brillig caller, which resulted in it being removed in the next inline pass, and then the original Brillig functions could be inlined in a third inline pass. Altogether confusing, and fixed in #7072

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Execution Report

Program Execution Time %
sha256_regression 0.051s 0%
regression_4709 0.001s 0%
ram_blowup_regression 0.608s 1%
rollup-root 0.105s 0%
rollup-merge 0.006s 0%
rollup-block-root 36.500s 0%
rollup-block-merge 0.105s 0%
rollup-base-public 1.226s 0%
rollup-base-private 0.452s 0%
private-kernel-tail 0.019s 0%
private-kernel-reset 0.313s 0%
private-kernel-inner 0.075s 10%

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Compilation Report

Program Compilation Time %
sha256_regression 1.160s 16%
regression_4709 0.885s 11%
ram_blowup_regression 15.800s -1%
rollup-root 3.564s -6%
rollup-merge 2.028s -7%
rollup-block-root-single-tx 136.000s -3%
rollup-block-root-empty 1.998s -8%
rollup-block-root 136.000s -3%
rollup-block-merge 3.632s -2%
rollup-base-public 28.280s -1%
rollup-base-private 10.540s 6%
private-kernel-tail 1.018s 3%
private-kernel-reset 6.782s 15%
private-kernel-inner 1.962s -7%

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Compilation Memory Report

Program Peak Memory
keccak256 77.65M
workspace 123.98M
regression_4709 424.15M
ram_blowup_regression 1.46G
rollup-root 601.26M
rollup-merge 494.24M
rollup-block-root-single-tx 16.06G
rollup-block-root-empty 488.41M
rollup-block-root 16.07G
rollup-block-merge 601.26M
rollup-base-public 2.38G
rollup-base-private 1.14G
private-kernel-tail 207.34M
private-kernel-reset 584.47M
private-kernel-inner 294.71M

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Execution Memory Report

Program Peak Memory
keccak256 74.71M
workspace 123.84M
regression_4709 316.02M
ram_blowup_regression 512.62M
rollup-root 498.31M
rollup-merge 473.05M
rollup-block-root 1.22G
rollup-block-merge 498.33M
rollup-base-public 734.17M
rollup-base-private 590.51M
private-kernel-tail 180.91M
private-kernel-reset 245.52M
private-kernel-inner 208.92M

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Changes to Brillig bytecode sizes

Generated at commit: a49c8624afc5bc815f925e875d80f4163f79f612, compared to commit: a6685befe3b103b236997843f69afc07b6ea0a8c

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
higher_order_functions_inliner_max +571 ❌ +815.71%
brillig_fns_as_values_inliner_zero +148 ❌ +740.00%
higher_order_functions_inliner_zero +59 ❌ +8.86%

Full diff report 👇
Program Brillig opcodes (+/-) %
higher_order_functions_inliner_max 641 (+571) +815.71%
brillig_fns_as_values_inliner_zero 168 (+148) +740.00%
higher_order_functions_inliner_zero 725 (+59) +8.86%
sha256_regression_inliner_max 6,689 (+13) +0.19%
sha256_inliner_max 2,288 (+4) +0.18%
sha256_var_padding_regression_inliner_max 4,910 (+6) +0.12%
hashmap_inliner_max 19,770 (-8) -0.04%
uhashmap_inliner_max 12,703 (-8) -0.06%
sha256_regression_inliner_min 5,340 (-8) -0.15%
sha256_regression_inliner_zero 4,861 (-8) -0.16%
sha256_var_padding_regression_inliner_min 3,285 (-6) -0.18%
sha256_var_padding_regression_inliner_zero 2,957 (-6) -0.20%
uhashmap_inliner_min 9,804 (-24) -0.24%
hashmap_inliner_zero 7,947 (-22) -0.28%
hashmap_inliner_min 12,595 (-40) -0.32%
uhashmap_inliner_zero 7,518 (-37) -0.49%
higher_order_functions_inliner_min 1,432 (-12) -0.83%
brillig_fns_as_values_inliner_min 219 (-4) -1.79%

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Changes to number of Brillig opcodes executed

Generated at commit: a49c8624afc5bc815f925e875d80f4163f79f612, compared to commit: a6685befe3b103b236997843f69afc07b6ea0a8c

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
higher_order_functions_inliner_max +1,054 ❌ +1064.65%
brillig_fns_as_values_inliner_zero +160 ❌ +888.89%
higher_order_functions_inliner_zero +131 ❌ +9.88%

Full diff report 👇
Program Brillig opcodes (+/-) %
higher_order_functions_inliner_max 1,153 (+1,054) +1064.65%
brillig_fns_as_values_inliner_zero 178 (+160) +888.89%
higher_order_functions_inliner_zero 1,457 (+131) +9.88%
sha256_regression_inliner_max 111,646 (+141) +0.13%
sha256_inliner_max 13,197 (+4) +0.03%
sha256_var_padding_regression_inliner_max 196,670 (+6) +0.00%
sha256_var_padding_regression_inliner_min 270,969 (-24) -0.01%
sha256_var_padding_regression_inliner_zero 265,671 (-24) -0.01%
sha256_regression_inliner_min 161,181 (-28) -0.02%
sha256_regression_inliner_zero 155,415 (-28) -0.02%
uhashmap_inliner_max 137,636 (-60) -0.04%
uhashmap_inliner_min 229,698 (-108) -0.05%
uhashmap_inliner_zero 175,966 (-98) -0.06%
hashmap_inliner_zero 75,833 (-52) -0.07%
hashmap_inliner_min 113,401 (-108) -0.10%
hashmap_inliner_max 48,525 (-60) -0.12%
higher_order_functions_inliner_min 2,790 (-26) -0.92%
brillig_fns_as_values_inliner_min 268 (-6) -2.19%

@aakoshh aakoshh marked this pull request as ready for review January 17, 2025 13:35
@aakoshh aakoshh requested a review from a team January 17, 2025 13:38
@TomAFrench
Copy link
Member

TomAFrench commented Jan 17, 2025

Pushed a small change to signal that we want to inline these functions more aggressively. The inliner isn't picking this up currently.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 17, 2025

That's a good idea, I was also wondering about it: the cost will always be above 0 by the looks of it, yet it seems like something that works better when inlined.

This is where InlineAlways should play a role in signalling. Did it not work?

@TomAFrench
Copy link
Member

Nah, I looked at adding a line to the closure which determine whether to inline to force it to inline InlineAlways functions and it's still resulting in the same bytecode.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 17, 2025

I think the InlineAlways worked, the problem is somewhere else. I'm testing with this:

fn main(w: Field) -> pub Field {
    let ret = twice(add1, 3);
    test_array_functions();
    w + ret
}
/// Test the array functions in std::array
fn test_array_functions() {
    let two = 2; // giving this a name, to ensure that the Option functions work with closures
    let myarray: [i32; 3] = [1, 2, 3];
    assert(myarray.any(|n| n > 2));
    assert(myarray.any(|n| n > two));

    let evens: [i32; 3] = myarray.map(|n| n * two); // [2, 4, 6]
    assert(evens.all(|n| n > 1));
    assert(evens.all(|n| n >= two));

    assert(evens.fold(0, |a, b| a + b) == 12);
    assert(evens.fold(0, |a, b| a + b + two) == 18);
    assert(evens.reduce(|a, b| a + b) == 12);
    assert(evens.reduce(|a, b| a + b + two) == 16);
    assert(evens.map(|n| n / 2) == myarray);
    assert(evens.map(|n| n / two) == myarray);
}

fn add1(x: Field) -> Field {
    x + 1
}

fn twice(f: fn(Field) -> Field, x: Field) -> Field {
    f(f(x))
}

Up and including the Loop Invariant Code Motion pass the difference between these calls is only in the inclusion of inc_rc on the brillig side:

cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807 --show-ssa
cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807 --force-brillig --show-ssa

Then, after unrolling, in ACIR we are left with 13 blocks, vs 33 with --force-brillig. Maybe they don't count as small loops.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 17, 2025

Indeed if I don't skip unrolling here then the number of opcodes go down from 641 to 23

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 17, 2025

Here are the loop statistics, used to decide if they are considered small or not:

STATS of b25: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 10 }; is_small=false
STATS of b23: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 2, all_instructions: 11 }; is_small=false
STATS of b21: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 10 }; is_small=false
STATS of b19: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 2, all_instructions: 10 }; is_small=true
STATS of b17: BoilerplateStats { iterations: 2, loads: 1, stores: 1, increments: 1, all_instructions: 29 }; is_small=false
STATS of b15: BoilerplateStats { iterations: 2, loads: 1, stores: 1, increments: 1, all_instructions: 19 }; is_small=false
STATS of b13: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 29 }; is_small=false
STATS of b11: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 19 }; is_small=false
STATS of b9: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 11 }; is_small=false
STATS of b7: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true
STATS of b5: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 2, all_instructions: 32 }; is_small=false
STATS of b3: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 10 }; is_small=false
STATS of b1: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true

We have a check to limit byte code growth percentage due to unrolling, but it's not set to anything by default. This is_small is already preventing the loop from unrolling, regardless of what would happen. Only very small loops can unrolled, because the constant overhead is not that big; it's all_instructions + 1 vs iterations * (all_instructions - increments - min(loads, stores)*2 - 3).

I tried cheating by considering any loop with up to 15 instructions as small, it still resulted in 475 instructions. Only by unrolling all of them did it collapse down. Unfortunately just unrolling b5 (the biggest) didn't do the trick, it's not special in that sense.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 17, 2025

On master it seems to be similar to #7072 (comment)

  • apply was acir, main is brillig
  • apply gets inlined into main, but because apply is acir and currently that means inline-unless-entry
  • because apply is acir, it marked every brillig function that called it as an entry point that should not be inlined, so after the 1st inlining there are a lot more functions kept on master, whereas here we end up with just main, fully inlined (this is an oversight which is fixed in feat(ssa): Pass to preprocess functions #7072)
  • then comes the kicker: because on master many of the brillig functions were actually not inlined, the loops into which they would have been inlined stayed smaller, and therefore they get unrolled:
BOILERPLATE STATS of b25: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 10 }; is_small=false
BOILERPLATE STATS of b23: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 2, all_instructions: 10 }; is_small=true
BOILERPLATE STATS of b21: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 10 }; is_small=false
BOILERPLATE STATS of b19: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 2, all_instructions: 10 }; is_small=true
BOILERPLATE STATS of b17: BoilerplateStats { iterations: 2, loads: 1, stores: 1, increments: 1, all_instructions: 8 }; is_small=true
BOILERPLATE STATS of b15: BoilerplateStats { iterations: 2, loads: 1, stores: 1, increments: 1, all_instructions: 8 }; is_small=true
BOILERPLATE STATS of b13: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 8 }; is_small=true
BOILERPLATE STATS of b11: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 8 }; is_small=true
BOILERPLATE STATS of b9: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true
BOILERPLATE STATS of b7: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true
BOILERPLATE STATS of b5: BoilerplateStats { iterations: 0, loads: 1, stores: 1, increments: 2, all_instructions: 10 }; is_small=true
BOILERPLATE STATS of b3: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true
BOILERPLATE STATS of b1: BoilerplateStats { iterations: 3, loads: 1, stores: 1, increments: 1, all_instructions: 9 }; is_small=true

@aakoshh aakoshh added this pull request to the merge queue Jan 17, 2025
Merged via the queue into master with commit dd70845 Jan 17, 2025
99 checks passed
@aakoshh aakoshh deleted the fix-defunctionalize-runtime branch January 17, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants