-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow blockparams on branches with multiple successors #170
base: main
Are you sure you want to change the base?
Conversation
Since the input CFG is required to not have critical edges, such blockparams are useless. The incoming blockparams can simply be replaced with the vreg from the (unique) predecessor. It's better to let the client's lowering code handle this so regalloc2 doesn't need to.
I think that this is a reasonable change to make, but cranelift is still generating code that would violate this new invariant. I believe that it would be pretty straightforward to fix that, but we probably won't be able to get to that until next year. Here's the issue that I added describing where the fix would need to be made: bytecodealliance/wasmtime#7639 |
(Popping in to give my thoughts on this in a timely manner, though I'm on parental leave at the moment) IMHO, there is some philosophical question here of where to place complexity: in the embedder (user of regalloc2) or in regalloc2 itself. Right now, regalloc2 accepts a well-defined IR: SSA, with blockparams, used in any way that SSA allows. Introducing further restrictions on that could make sense if we were developing the allocator from scratch, or if it otherwise were a substantial improvement in complexity or speed, but re: complexity we see a "+90 -92" diff, and re: speed there is no improvement cited here. (Perhaps some improvement does occur and I'd be curious if you've measured any!) Finally, existing consumers (notably Cranelift) generate code without keeping this extra restriction in mind, and it creates additional work (and potential for bugs) to adapt to the new requirement. Such blockparams are indeed "useless" if one simplifies completely, but generating the code now requires additional care, and there is value in having components or stages of a compiler that accept a more general input without required canonicalization, especially when offering a library. So: since the allocator already exists and supports this more general input, why remove that capability? (More succinctly: is there a motivation for this PR, beyond the "useless" value judgment above?) |
My argument in favor would be that any simplifications we can make to regalloc2 lessen the maintenance burden, and adapting cranelift as it stands is not a huge effort. Certainly we can leave things as they are, but given how few users of regalloc2 there are, I think that simplification is a good goal. |
Right, agreed with that; I guess my question is really whether this is a simplification, in a global sense; it doesn't seem to have removed any logic in RA2 here, and I don't think it simplifies the allocation problem in any appreciable way. (Please do correct me if I'm missing some aspect here though!) Additional constraints/requirements to keep in mind on the user side are also complexity, so from my perspective at the moment, this change pushes us into higher complexity overall. (Disappearing back into my cave now, but happy to discuss more in February when I'm back; just wanted to inject this perspective in case this were to move forward otherwise) |
To amend the above a bit: please don't let me block this if it is indeed a simplification to the allocation problem, for reasons I'm missing; I just wasn't sure what the motivation was, and wanted to ask. (The only changes to |
Chatted a bit offline with @elliottt about this and we surmised that there may be indirect benefits from this: the changes to CL necessary to fit these constraints plausibly could improve allocation time and/or runtime. In any case, @Amanieu I'm still curious to hear more of your motivation or backstory for this -- were you working toward other simplifications or seeing some direct benefits or ...? |
This isn't intended to improve performance, it is mainly about simplifying the logic a bit. The motivating code for this is actually on a branch that I'm working on, specifically this commit. |
It isn't needed when there is only one successor.
My branch for adding unwinding support to cranelift makes use of blockparams for terminators with multiple successors. The invoke terminator has multiple successors, each of which has block params. For the regular return case it uses the blockparams to pass the call return value and for the unwinding case to pass the exception data. |
Does your branch require blockparams on branches after lowering? |
Currently it doesn't do that, but I think it will be necessary for regalloc to not insert moves between the call and the end of the block. The unwind path would skip those moves. I haven't seen any miscompilation from this yet though. |
At the regalloc2 level you should be able to model the |
This extends the fuzzer and checker to verify that branches that produce outputs are properly supported and no moves are inserted after a branch. This also checks the one edge case where a branch instruction may have both operands and blockparams: when there is only one successor and that successor only has one predecessor.
I added a check in the fuzzer (and fixed the checker to handle it) that branch instructions that produce outputs work properly. This would allow you to model |
This lets us stop allocating temporary VRegs for critical edges that have block parameters. That makes the register allocation problem a little smaller, and also allows reusing lower_branch_blockparam_args for all block parameters. Fixes bytecodealliance#7639, and unblocks bytecodealliance/regalloc2#170
Since the input CFG is required to not have critical edges, such blockparams are useless. The incoming blockparams can simply be replaced with the vreg from the (unique) predecessor.
It's better to let the client's lowering code handle this so regalloc2 doesn't need to.