Skip to content

Conversation

@yhabteab
Copy link
Member

@yhabteab yhabteab commented Sep 26, 2025

This PR fixes a DSL config parsing ambiguity discovered by @julianbrost by using this expression () => { {{ {{{foo}}} }} }(). First, I thought that only the last two rules were conflicting each other due to the same assigned dynamic precedence. However, after adding this expression as a test case, I found that the entire lterm_items_inner rule was conflicting due to the inconsistent precedence assignments in its alternate rules. I've now changed the precedence assignments so that it always prefers the rule that reduces on/shifts rterm_no_side_effect first, so that when you have pure expressions like the trigger of this bug, it reports an error about unused value instead of a parsing error. The precedence only needs to be consistent with the rules on the same level, i.e. a rule that can directly be reduced to lterm_items_inner without a recursion doesn't need to have a smaller or higher precedence than a rule that has left recursion, because they will never conflict.

Expand Me
Ambiguity detected.
Option 1,
  lterm_items_inner -> <Rule 6, tokens 1 .. 10>
    lterm -> <Rule 70, tokens 1 .. 10>
      rterm_side_effect -> <Rule 88, tokens 1 .. 10>
        rterm -> <Rule 146, tokens 1 .. 8>
          rterm_no_side_effect -> <Rule 143, tokens 1 .. 8>
            rterm_no_side_effect_no_dict -> <Rule 114, tokens 1 .. 8>
              '(' <tokens 1 .. 1>
              identifier_items <empty>
              ')' <tokens 2 .. 2>
              use_specifier <empty>
              "=> (T_FOLLOWS)" <tokens 3 .. 3>
              @15 -> <Rule 113, empty>
              rterm_scope -> <Rule 84, tokens 4 .. 8>
                '{' <tokens 4 .. 4>
                @13 -> <Rule 83, empty>
                statements -> <Rule 2, tokens 5 .. 7>
                  optional_newlines -> <Rule 169, empty>
                  lterm_items -> <Rule 4, tokens 5 .. 7>
                    lterm_items_inner -> <Rule 7, tokens 5 .. 7>
                      rterm_no_side_effect -> <Rule 143, tokens 5 .. 7>
                        rterm_no_side_effect_no_dict -> <Rule 142, tokens 5 .. 7>
                          "{{ (T_NULLARY_LAMBDA_BEGIN)" <tokens 5 .. 5>
                          @18 -> <Rule 141, empty>
                          statements -> <Rule 2, tokens 6 .. 6>
                            optional_newlines -> <Rule 169, empty>
                            lterm_items -> <Rule 4, tokens 6 .. 6>
                              lterm_items_inner -> <Rule 7, tokens 6 .. 6>
                                rterm_no_side_effect -> <Rule 143, tokens 6 .. 6>
                                  rterm_no_side_effect_no_dict -> <Rule 92, tokens 6 .. 6>
                                    T_STRING <tokens 6 .. 6>
                          "}} (T_NULLARY_LAMBDA_END)" <tokens 7 .. 7>
                '}' <tokens 8 .. 8>
        '(' <tokens 9 .. 9>
        rterm_items -> <Rule 71, empty>
        ')' <tokens 10 .. 10>

Option 2,
  lterm_items_inner -> <Rule 7, tokens 1 .. 10>
    rterm_no_side_effect -> <Rule 143, tokens 1 .. 10>
      rterm_no_side_effect_no_dict -> <Rule 115, tokens 1 .. 10>
        '(' <tokens 1 .. 1>
        identifier_items <empty>
        ')' <tokens 2 .. 2>
        use_specifier <empty>
        "=> (T_FOLLOWS)" <tokens 3 .. 3>
        rterm -> <Rule 145, tokens 4 .. 10>
          rterm_side_effect -> <Rule 88, tokens 4 .. 10>
            rterm -> <Rule 146, tokens 4 .. 8>
              rterm_no_side_effect -> <Rule 144, tokens 4 .. 8>
                rterm_dict -> <Rule 80, tokens 4 .. 8>
                  '{' <tokens 4 .. 4>
                  @11 -> <Rule 79, empty>
                  statements -> <Rule 2, tokens 5 .. 7>
                    optional_newlines -> <Rule 169, empty>
                    lterm_items -> <Rule 4, tokens 5 .. 7>
                      lterm_items_inner -> <Rule 7, tokens 5 .. 7>
                        rterm_no_side_effect -> <Rule 143, tokens 5 .. 7>
                          rterm_no_side_effect_no_dict -> <Rule 142, tokens 5 .. 7>
                            "{{ (T_NULLARY_LAMBDA_BEGIN)" <tokens 5 .. 5>
                            @18 -> <Rule 141, empty>
                            statements -> <Rule 2, tokens 6 .. 6>
                              optional_newlines -> <Rule 169, empty>
                              lterm_items -> <Rule 4, tokens 6 .. 6>
                                lterm_items_inner -> <Rule 7, tokens 6 .. 6>
                                  rterm_no_side_effect -> <Rule 143, tokens 6 .. 6>
                                    rterm_no_side_effect_no_dict -> <Rule 92, tokens 6 .. 6>
                                      T_STRING <tokens 6 .. 6>
                            "}} (T_NULLARY_LAMBDA_END)" <tokens 7 .. 7>
                  '}' <tokens 8 .. 8>
            '(' <tokens 9 .. 9>
            rterm_items -> <Rule 71, empty>
            ')' <tokens 10 .. 10>

[2025-09-26 12:05:53 +0200] critical/config: Error: syntax is ambiguous
Location: in icinga2.conf: 1:0-1:26
icinga2.conf(1): () => { {{ {{{foo}}} }} }()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
icinga2.conf(2):
[2025-09-26 12:05:53 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

@yhabteab yhabteab added bug Something isn't working area/configuration DSL, parser, compiler, error handling labels Sep 26, 2025
@cla-bot cla-bot bot added the cla/signed label Sep 26, 2025
@yhabteab yhabteab force-pushed the fix-ambiguous-grammar-rules branch from dfb9173 to 03cf3b1 Compare September 26, 2025 10:24
@jschmidt-icinga
Copy link
Contributor

jschmidt-icinga commented Sep 29, 2025

This is another topic I don't fully understand (yet), so bear with me, I may be entirely wrong:

I've experimented a bit and found that () => { {{ {{{foo}}} }} }()() (note the second set of parenthesis at the end) still triggers the same kind of ambiguity. Same for () => { "foo" }().

Which leads to my primary confusion: What is that expression expected to do in the first place? (() => { {{ {{{foo}}} }} })() would be obvious and works, with or without this patch, but would we expect () => { "foo" }() to be a function call same as (() => { "foo" })(), which also works correctly? I don't think so.

If not, maybe we should rather detect an error like "Attempted to call non-function object." or something like that. If we clearly parsed out that kind of erroneous semantics, it would apply to all permutations where it causes ambiguity otherwise. No idea how that would look like exactly, probably extending the rterm_side_effect: rterm '(' rterm_items ')' rule or something like that.

@yhabteab
Copy link
Member Author

I've experimented a bit and found that () => { {{ {{{foo}}} }} }()() (note the second set of parenthesis at the end) still triggers the same kind of ambiguity. Same for () => { "foo" }().

Are you sure? I can't reproduce this:

$ cat icinga2.conf
() => { {{ {{{foo}}} }} }()()
$ prefix/sbin/icinga2 daemon -c icinga2.conf
[2025-09-29 17:08:49 +0200] information/cli: Icinga application loader (version: v2.15.0-112-g03cf3b16f; debug)
[2025-09-29 17:08:49 +0200] critical/config: Error: Value computed is not used.
Location: in icinga2.conf: 1:8-1:22
icinga2.conf(1): () => { {{ {{{foo}}} }} }()()
                        ^^^^^^^^^^^^^^^
[2025-09-29 17:08:49 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
$ cat icinga2.conf
() => { "foo" }()
$ prefix/sbin/icinga2 daemon -c icinga2.conf
[2025-09-29 17:06:57 +0200] information/cli: Icinga application loader (version: v2.15.0-112-g03cf3b16f; debug)
[2025-09-29 17:06:57 +0200] critical/config: Error: Value computed is not used.
Location: in icinga2.conf: 1:8-1:12
icinga2.conf(1): () => { "foo" }()
                        ^^^^^
[2025-09-29 17:06:57 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

Which leads to my primary confusion: What is that expression expected to do in the first place? (() => { {{ {{{foo}}} }} })() would be obvious and works, with or without this patch, but would we expect () => { "foo" }() to be a function call same as (() => { "foo" })(), which also works correctly? I don't think so.

Why don't you think so? That's exactly what the expression should do. First the parser defines an anonymous lambda which by definition is a callable object, then constructs a function call expression with the lambda as callee. The ambiguity without this PR arises because the parser has too many options that all lead to the rterm_no_side_effect rule due to the inconsistent dynamic precedence assignments.

If not, maybe we should rather detect an error like "Attempted to call non-function object." or something like that. If we clearly parsed out that kind of erroneous semantics, it would apply to all permutations where it causes ambiguity otherwise.

The parser isn't responsible to detect such semantic errors. Its job is just to parse the input and produce some output
that other parts of the system can work with. If the input is syntactically valid which in this case it is, the parser should
not encounter parsing ambiguities. If later stages of the system find out that the callee from the call expression is not
a callable object, they can/will complain about it as such. Also, this PR does not change any semantics of the language, it just fixes a parsing ambiguity that existed before, and what you're suggesting is an actual language change. The expression used in my PR description is valid, if you for example assign it to a variable like this:

var x = () => { {{ {{{foo}}} }} }()
log(x())

This won't trigger any parsing ambiguities and will actually work as expected and print foo even without this PR.

@julianbrost
Copy link
Contributor

If not, maybe we should rather detect an error like "Attempted to call non-function object." or something like that. If we clearly parsed out that kind of erroneous semantics, it would apply to all permutations where it causes ambiguity otherwise.

The parser isn't responsible to detect such semantic errors.

From the wording, that error sounds more like a runtime error than a parse error to me.

and what you're suggesting is an actual language change.

Is it? Isn't that just the question which of the two ambiguous options is chosen? The minimal example is actually this: () => {} (). There are currently two ways of adding extra parenthesis to make the parser accept it (without changes from this PR). (() => {}) () and () => ({} ()). The latter in fact already returns a lambda that fails with "Argument is not a callable object." if you call it.

@jschmidt-icinga
Copy link
Contributor

Are you sure? I can't reproduce this:

Sorry, I can't reproduce it anymore either today. I double checked yesterday, back and forth to make sure I'm not imagining it, but seems I did... 🤦

From the wording, that error sounds more like a runtime error than a parse error to me.

The "Argument is not a callable object." might be a runtime error, but first you'd need to parse it correctly so we can detect it.

Why don't you think so? That's exactly what the expression should do.

But then why doesn't it? I think either the parser should correctly hand the Function object to the FunctionCallExpression object, or it should fail with an error at parse time because we don't allow this syntax.

@yhabteab
Copy link
Member Author

yhabteab commented Sep 30, 2025

From the wording, that error sounds more like a runtime error than a parse error to me.

Okay, such error handling is already implemented at evaluation time, so I don't know what else you want me to do then.

and what you're suggesting is an actual language change.

Is it? Isn't that just the question which of the two ambiguous options is chosen?

You are questioning the entire behavior of such an otherwise perfectly fine language construct. So yes, it is a language change if you would want to forbid that. As I already explained, we're using a GLR parser, normally any reduce/reduce conflict is resolved by starting sub-parsers and letting them run multiple parse trees in parallel until one of them produces a valid parse tree and the others silently fail with a syntax error at some point. In this case, however, both parse trees are valid until the very end, thus both parsers succeed and produce an identical function call expression. The first parse tree interpreted the expression as a function call but it would mark it as with side effects through the lterm_items_inner sep lterm %dprec 1 rule after reduction, while the second parse tree resulted into an actual function call as well but this on the other hand, would mark it as without side effects through the lterm_items_inner sep rterm_no_side_effect %dprec 1 rule after reduction.

So, the parser can't decide which of the two parse trees to pick, because both are valid interpretations of the expression and both of them have the same %dprec assignments. The only way1 to resolve such ambiguity is to correctly assign a %dprec to both rules, which is what I did. Since, expressions produced by the lterm_items_inner rule are actually statements (they solely exist for their side effects), I've changed the precedence so that the parser always picks the second parse tree, that produces a function call expression without side effects, which will then result in a Value computed is not used error. If you don't want such an error, then I can reverse the precedence (so that the parser always picks the first parse tree), but then you will have a lambda expression that is never going to be used for anything, but you wouldn't get any ambiguous parse trees anymore.

(() => {}) () and () => ({} ()). The latter in fact already returns a lambda that fails with "Argument is not a callable object." if you call it.

The former is a lambda expression that is immediately called, the latter is a lambda expression with a malformed body, two very different things. However, with the first expression you seem to somehow forced the parser to pick the first parse tree described above, thus eliminating the ambiguity. In that case, it produced the function call exp with alleged side effects through the lterm_items_inner: lterm %dprec 2 rule, while in fact it's just a function call expression without any side effects.

Footnotes

  1. of course, there are other ways as well, like by providing a merge function, but merging such expression would be pointless because both are identical.

@Al2Klimov
Copy link
Member

The only way1 to resolve such ambiguity is to correctly assign a %dprec to both rules

Wouldn't it be clever AND smart to give each variant a distinct %dprec then, as in dfb9173?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/configuration DSL, parser, compiler, error handling bug Something isn't working cla/signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants