-
Notifications
You must be signed in to change notification settings - Fork 138
[CIR] Add 'core-flat' as an option value for '-emit-mlir=' #1466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
darkbuck
commented
Mar 11, 2025
- Both 'func' and 'scf' have structure control flow constraints, e.g., 'func.return' needs to have 'func.func' parent op. The alternative approach to lower CIR into MLIR standard dialects is to lower CIR into 'cf' dialect and perform structurization on all or selected regions.
- Add 'core-flat' option to enable CFG flattening when lowering CIR into MLIR standard dialects.
- Add 'cir-unify-func-return' pass to unify returns into branches to a trialing block dedicated for function return.
- Fix 'cir.br' and 'cir.return' lowering to MLIR and allow function declarations.
29502b8
to
49a2b45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving things here.
Add 'core-flat' option to enable CFG flattening when lowering CIR into MLIR standard dialects.
Instead of adding a new core-flat mode, you should instead use the information on whether we are in the “throughMLIR” pipeline and change how CFGFlattening works based on that (e.g. we could change which rewrites will run for direct LLVM or through MLIR)
Add 'cir-unify-func-return' pass to unify returns into branches to a trialing block dedicated for function return.
This should be an additional rewrites inside the existing flattening pass.
(cc @keryell) |
Let me check what's we should follow.
here's my understanding. flatten could be run for either ThroughMLIR or DirectToLLVM options. For the later, return don't need unifiying that as 'llvm.return' doesn't have the restriction that its parent must be the function operaton. Only 'func.return' has that restriction. If we want to represent any C/C++ code in the 'func' dialect, we need to unify return. It seems to me that unifying should be dedicated pass or at least one option depending which dialect we want to lower CIR to. I though it'd better to separate them. Any suggestion? |
I'm basically suggesting:
I don't see why you need an optional extra compiler flag for doing this if it always needs to be enabled when lowering to core dialects. |
49a2b45
to
6fd8f88
Compare
Fix that in the new revision. Add a new 'through-mlir' option in 'cir-flatten-cfg' pass to unify returns if we lower CIR through MLIR. However, after 49f7c80, we don't has a dedicated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Just a few remarks and questions.
unsigned numReturnOps = 0; | ||
func->walk([&](cir::ReturnOp op) { | ||
ops.push_back(op.getOperation()); | ||
++numReturnOps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use ops.size()
?
@@ -937,4 +1015,10 @@ std::unique_ptr<Pass> createFlattenCFGPass() { | |||
return std::make_unique<FlattenCFGPass>(); | |||
} | |||
|
|||
std::unique_ptr<Pass> createFlattenCFGPass(bool throughMLIR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function just be the previous one extended with a defaulted parameter, such as
std::unique_ptr<Pass> createFlattenCFGPass(bool throughMLIR) { | |
std::unique_ptr<Pass> createFlattenCFGPass(bool throughMLIR = false) { |
or true
?
6fd8f88
to
64971e2
Compare
Thanks for your valuable comments. I fixed all of them except the last one. It's confusing to me whether we don't use the default constructor for each pass. I understand that passes like |
I don't yet see why this flag is needed, if you want to test MLIR output, you'd use |
…#1081) Now implement the same as [OG](https://github.com/llvm/clangir/blob/7619b20d7461b2d46c17a3154ec4b2f12ca35ea5/clang/lib/CodeGen/CGBuiltin.cpp#L7886), which is to call llvm aarch64 intrinsic which would eventually become [an ARM64 instruction](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/ABS--Absolute-value--vector--?lang=en). However, clearly there is an alternative, which is to extend CIR::AbsOp and CIR::FAbsOp to support vector type and only lower it at LLVM Lowering stage to either [LLVM::FAbsOP ](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrfabs-llvmfabsop) or [[LLVM::AbsOP ]](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrabs-llvmabsop), provided LLVM dialect could do the right thing of TargetLowering by translating to llvm aarch64 intrinsic eventually. The question is whether it is worth doing it? Any way, put up this diff for suggestions and ideas.
…f -fstrict-vtable-pointers (llvm#1138) Without using flag `-fstrict-vtable-pointers`, `__builtin_launder` is a noop. This PR implements that, and leave implementation for the case of `-fstrict-vtable-pointers` to future where there is a need. This PR also adapted most of test cases from [OG test case](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L1). I didn't use test cases in the namespace [pessimizing_cases](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L269), as they have no difference even when `-fstrict-vtable-pointers` is on.
This PR also changed implementation of BI__builtin_neon_vshlq_v into using CIR ShiftOp
After I rebased, I found these problems with Spec2017. I was surprised why it doesn't have problems. Maybe some updates in LLVM part.
The requirement for the size of then-else part of cir.ternary operation seems to be too conservative. Like the example shows, it is possible the regions got expanded during the transformation.
…m#1169) For example, the following reaches ["NYI"](https://github.com/llvm/clangir/blob/c8b626d49e7f306052b2e6d3ce60b1f689d37cb5/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp#L348) when lowering to AArch64: ``` typedef struct { union { struct { char a, b; }; char c; }; } A; void foo(A a) {} void bar() { A a; foo(a); } ``` Currently, the value of the struct becomes a bitcast operation, so we can simply extend `findAlloca` to be able to trace the source alloca properly, then use that for the [coercion](https://github.com/llvm/clangir/blob/c8b626d49e7f306052b2e6d3ce60b1f689d37cb5/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp#L341) through memory. I have also added a test for this case.
Added a few FIXMEs. There are 2 types of FIXMEs; 1. Most of them are missing func call and parameter attributes. I didn't add for all missing sites for this type as it would have been just copy pastes. 2. FIXME in lambda __invoke(): OG simply returns but CIR generates call to llvm.trap. This is just temporary and we will fix in in near future. But I feel I should still list those IRs so once we fix problem with codegen of invoke, we'd get test failure on this one and fix it. Actually, this way, this test file would be a natural test case for implementation of invoke.
There are scenarios where we are not emitting cleanups, this commit starts to pave the way to be more complete in that area. Small addition of skeleton here plus some fixes. Both `clang/test/CIR/CodeGen/vla.c` and `clang/test/CIR/CodeGen/nrvo.cpp` now pass in face of this code path.
…lvm#1166) Close llvm#1131 This is another solution to llvm#1160 This patch revert llvm#1007 and remain its test. The problem described in llvm#1007 is workaround by skipping the check of equivalent of element types in arrays. We can't mock such checks simply by adding another attribute to `ConstStructAttr` since the types are aggregated. e.g., we have to handle the cases like `struct { union { ... } }` and `struct { struct { union { ... } } }` and so on. To make it, we have to introduce what I called "two type systems" in llvm#1160. This is not very good giving it removes a reasonable check. But it might not be so problematic since the Sema part has already checked it. (Of course, we still need face the risks to introduce new bugs any way)
…of floating type (llvm#1174) [PR1132](llvm#1132) implements missing feature `fpUnaryOPsSupportVectorType`, so revisit this code. One another thing changed is that I stopped using `cir::isAnyFloatingPointType` as it contains types like long double and FP80 which are not supported by the [builtin's signature](https://clang.llvm.org/docs/LanguageExtensions.html#vector-builtins)
[OG's implementation ](https://github.com/llvm/clangir/blob/aaf38b30d31251f3411790820c5e1bf914393ddc/clang/lib/CodeGen/CGBuiltin.cpp#L7527) provides one common code to handle all neon SISD intrinsics. But IMHO, it entangles different things together which hurts readability. Here, We start with simple easy-to-understand approach with specific case. And in the future, as we handle more intrinsics, we may come up with a few simple common patterns.
Lower neon vabsd_s64
…lvm#1431) Implements `::verify` for operations cir.atomic.xchg and cir.atomic.cmp_xchg I believe the existing regression tests don't get to the CIR level type check failure and I was not able to implement a case that does. Most attempts of reproducing cir.atomic.xchg type check failure were along the lines of: ``` int a; long long b,c; __atomic_exchange(&a, &b, &c, memory_order_seq_cst); ``` And they seem to never trigger the failure on `::verify` because they fail earlier in function parameter checking: ``` exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *' 7 | __atomic_exchange(&a, &b, &c, memory_order_seq_cst); | ^~ ``` Closes llvm#1378 .
Lower neon vcaled_f64
This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.
- Both 'func' and 'scf' have structure control flow constraints, e.g., 'func.return' needs to have 'func.func' parent op. The alternative approach to lower CIR into MLIR standard dialects is to lower CIR into 'cf' dialect and then perform structurization on all or selected regions. - Add 'core-flat' option to enable CFG flattening when lowering CIR into MLIR standard dialects. - Enhance 'cir-flatten-cfg' pass to unify returns into branches to a dedicated return block. - Fix 'cir.br' and 'cir.return' lowering to MLIR and allow function declarations.
64971e2
to
8ec33f0
Compare
By specifying |
Some confusion might be going on: there should not be a new flag coordinating this, you have to use both |
That combination could generate CIR output as
Once |
I don't think we need the extra flag, we have the necessary combinations already. If you want to test CIR-to-MLIR conversion add the extra RUN lines as I previous explained. |
For testing, that extra RUN is good enough. But, we want to clang option(s) to generate MLIR directly from C/C++ source. Any suggestion? |
This option is |
Could we remove 'cir-flat' and add back the original |
I'm surprised the behavior changed, I was expecting |
Hello, I am not 100 % sure if I understand this conversation correctly, so correct me if I'm wrong. The goal is to perform the flattening pass before applying the passes that lower CIR to Core MLIR? I believe there is some misunderstanding on how the flags worked/work now:
No, this combination would stop after emitting (flattened) CIR, it would not perform further passes. That should be inline with the behaviour before the changes. I checked out a commit before my change and the behaviour seems to be the same for this combination of flags (replacing
I don't believe that's true. The
If that's the case, I'm happy to work on fixing it. @darkbuck In what way did you use the |
This is my takeaway from this discussion so far
Thanks for clarifying your change didn't change it! |
Extracted
That's the code I refer to. |
I see, but the option generated by the non-alias code was still being handled as an action. So my understanding is that the alias should behave the same way as the old option - it is still being handled as an action to emit flattened CIR, only with different scaffolding around it. Could you please explain/show for which combination of flags is the behaviour different compared to the old version? |
Looking more into the PR I would like to check if I understand the goals of the newly added code correctly. You are adding a new path to the lowering process, that works on flattened CIR and goes through MLIR Core dialects, is that right? Another option (that quickly comes to my mind) would be to remove both the Please, correct me if I'm wrong |
Previously, to generate LLVM, we had two options: one was directly lowered into LLVM dialect (direct-CIR-to-LLVM) , and the other one was through MLIR first, followed by MLIR to LLVM (CIR-to-MLIR-to-LLVM). Flattening is just an option to prepare the CIR before lowering it. That CIR would be fed into either approach (direct-CIR-to-LLVM, or CIR-to-MLIR-to-LLVM). In that sense, |
My previous point is that through core path is still experimental and we don't need two paths for it, it should always do the flattening this PR implements, therefore no new flags needed. |
Do you mean we turn on flattening by default? |
I see. Yes, in that case it should be enough to just modify the expression assigned to
That was not the old behaviour of the flag. The behaviour did not change. The flag was for emitting (~printing) of CIR after flattening. If you look into the descriptions/help texts of |
|
Yes for the lowering path from CIR to Core, of course it should only run the passes that make sense for Core (and not the other ones that run for Direct, but I already gave this feedback as part of the review). Note that I'm assuming that what you are doing here is useful, and none of the other folks interested in core lowering seems to have issue with this, that's why I'm saying we don't need two paths for core. |