-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ssa): Pass to preprocess functions #7072
base: master
Are you sure you want to change the base?
Conversation
Compilation Memory Report
|
Execution Memory Report
|
Compilation Report
|
Execution Report
|
Some observations based on the post-preprocessed SSA of
The last would benefit from a generalization of this PR but we can break out other PRs to tackle the other two cases. |
Posting for SSA tests (obviously once cut down) SSA
|
// Bottom-up order, starting with the "leaf" functions, so we inline already optimized code into the ones that call them. | ||
let bottom_up = inlining::compute_bottom_up_order(&self); | ||
let not_to_inline = inlining::get_functions_to_inline_into(&self, false, aggressiveness); | ||
|
||
for id in bottom_up.into_iter().filter(|id| !not_to_inline.contains(id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAFrench do you reckon that this kind of bottoms-up ordering and the folding functions makes sense? Maybe not exactly like this (you mentioned doing stuff in parallel, and limiting which function it gets applied to), but as a general direction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks sensible to me from a quick skim. The execution failures we're getting on the protocol circuits seem real though (rather than just having stale inputs) so something is up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was because of calling the DIE pass; included an example in the PR description:
After Removing Paired rc_inc & rc_decs:
...
acir(inline) fn Add10 f1 {
b0(v0: &mut Field):
v1 = load v0 -> Field
v2 = load v0 -> Field
v4 = add v2, Field 10
store v4 at v0
return
}
After Pre-processing Functions:
...
acir(inline) fn Add10 f1 {
b0(v0: &mut Field):
return # Whole function body got eliminated!
}
It sees store v4 at v0
as a dead instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we need to pass some extra context to the can_eliminate_if_unused
method on Instruction
// Store instructions must be removed by DIE in acir code, any load |
Changes to number of Brillig opcodes executed
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
Changes to Brillig bytecode sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
Changes to circuit sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
5817baf
to
80872e5
Compare
I investigated the increase of bytecode size of
With the preprocessing pass this changed:
I fixed a few things:
The result of these fixes is that the cost is bytecode size is higher than it was, but now it's high even if we use I'll investigate why |
I think the same thing happened with $ cd test_programs/execution_success/higher_order_functions
$ cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807
+------------------------+----------+----------------------+--------------+-----------------+
| Package | Function | Expression Width | ACIR Opcodes | Brillig Opcodes |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | Bounded { width: 4 } | 1 | 0 |
+------------------------+----------+----------------------+--------------+-----------------+
$ cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807 --force-brillig
+------------------------+----------+----------------------+--------------+-----------------+
| Package | Function | Expression Width | ACIR Opcodes | Brillig Opcodes |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | Bounded { width: 4 } | 1 | 641 |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | N/A | N/A | 641 |
+------------------------+----------+----------------------+--------------+-----------------+
$ cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807 --force-brillig --skip-preprocess-fns
+------------------------+----------+----------------------+--------------+-----------------+
| Package | Function | Expression Width | ACIR Opcodes | Brillig Opcodes |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | Bounded { width: 4 } | 1 | 641 |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | N/A | N/A | 641 |
+------------------------+----------+----------------------+--------------+-----------------+
$ git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
$ cargo run -q -p nargo_cli -- --program-dir . info --force --silence-warnings --inliner-aggressiveness 9223372036854775807 --force-brillig
+------------------------+----------+----------------------+--------------+-----------------+
| Package | Function | Expression Width | ACIR Opcodes | Brillig Opcodes |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | Bounded { width: 4 } | 1 | 70 |
+------------------------+----------+----------------------+--------------+-----------------+
| higher_order_functions | main | N/A | N/A | 70 |
+------------------------+----------+----------------------+--------------+-----------------+ We can see that:
|
Description
Problem*
Followup for #7001 (comment)
Summary*
Adds a
Ssa::preprocess_fns
pass that does a subset of SSA passes on functions in a bottom-up order, so that by the time they are inlined into other functions, they have a chance to be already simplified. The changes made are cumulative, ie. if A calls B, then B gets optimised first, and then A inlines the optimised B before it gets optimised itself.To avoid going all the way to the top and optimising the functions near the entry point, the weight of the functions (derived from their instruction count) is accumulated to the callers, and the average of the accumulated weights serves as a cutoff for running the preprocessing.
Added a
--skip-preprocess-fns
CLI option to turn this pass off.TODO:
Additional Context
Idea
We noticed in #7001 (comment) that
main
gets bloated to 65K blocks after the unrolling pass, because everything it calls has already been inlined into it, and then we have to simplify down this large function, which came with a big memory footprint.@TomAFrench suggested switching to a bottom-up scenario, where we optimise functions that get called by others before inlining them, and to start as follows:
Testing
cargo run -p nargo_cli -- --program-dir ../aztec-packages/noir-projects/noir-protocol-circuits/crates/rollup-base-public compile --silence-warnings --skip-underconstrained-check --skip-brillig-constraints-check --force --show-ssa-pass "Preprocessing"
Result
Unfortunately this hasn't significantly changed the number of blocks after unrolling in the
main
, at least not yet:The SSA of one of the fist
serialize
function in the linked comment changed as follows:Before Pre-processing
After Pre-processing
So this has been achieved 🎉 :
Integration Test Failures
External repos and some integration tests are failing. The simplest example is perhaps this:
Looking at the SSA it seems completely wrong:
This goes away if we remove the
function.dead_instruction_elimination(true);
call. It looks like it doesn't know that storing to the mutablev0
can be used outside.Documentation*
Check one:
PR Checklist*
cargo fmt
on default settings.