[Feature]: More constant symplification? #381

Moelf · 2024-12-08T18:13:36Z

Feature Request

atlas_template = TemplateStructure{(:b, :e)}(
     ((; b, e), (x,)) -> b(x)^e(log(x))
)

sr_model2 = SRRegressor(
    niterations=150,
    binary_operators=[+, -, *, ^],
    unary_operators=[log],
    should_optimize_constants=true,
    seed=nothing,
    nested_constraints = [log => [log => 0, (^) => 0]],
    expression_type=TemplateExpression,
    expression_options=(; structure=atlas_template),
    loss_function = chi2_sr
)

sr_mach2 = machine(sr_model2, X, ys)

fit!(sr_mach2)

log(162.43) shouldn't use log at all (this increase the complexity of the expression unnecessarily)

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2024-12-08T19:00:43Z

You likely just need to set should_simplify=true. By default this is set to false if you are using nested_constraints as simplification might result in an expression that violates constraints. I think here though it's fine.

MilesCranmer · 2024-12-08T19:01:37Z

But in general there will be some points when the expression has redundancies like this. The genetic algorithm should eventually simplify it based on a lower complexity.

MilesCranmer · 2024-12-08T19:04:15Z

Also - we don't want to necessarily force simplified expressions. This is because mutations are single-step, so to get to log(x + const), the genetic algorithm might prefer to go from const to log(const) to log(x + const), rather than x to log(x) to log(x + const) which might hit a NaN. If we force simplification at every step, the log(const) expression wouldn't be possible.

Moelf · 2024-12-08T19:46:51Z

It is already set to true

MilesCranmer · 2024-12-08T19:57:24Z

Oh in the code you sent, it isn't set at all:

julia> print(sr_model2.should_simplify)
nothing

It will be set to false during the search based on the other parameters set.

Maybe you meant should_simplify instead of should_optimize_constants?

MilesCranmer · 2024-12-09T02:01:50Z

(Is this ok to close? Or any issues with it?)

Moelf · 2025-02-14T14:33:31Z

    atlas_template = TemplateStructure{(:B, :E)}(
        ((; B, E), (x,)) -> B(x)^E(log(x))
    )

    sr_model = SRRegressor(;
        niterations=300,
        binary_operators=[+, *],
        should_simplify=true,
        seed=20241206,
        expression_type=TemplateExpression,
        expression_options=(; structure=atlas_template),
        loss_function_expression=chi2_sr_functor(errs),
        deterministic=true,
        parallelism=:serial,
        kw...
    )

I'm seeing this re-surface (see Complexity 14 which should be further simplified into a ax + b)

MilesCranmer · 2025-02-14T15:25:22Z

The simplified version of that expression is already stored in complexity 10, right?

Moelf · 2025-02-14T15:57:08Z

yeah, and for that matter, Complexity 8 is a simplification of Complexity 10.

But I do expect 2-nd order polynomial to show up at some point. In this case, it shows up down the list:

So this is the intended behavior? That the fundamentally the same equation will appear multiple times in HoF each with one simplification step thus a different complexity?

MilesCranmer · 2025-02-14T17:19:37Z

SymbolicRegression.jl fundamentally doesn't "understand" what operators you pass it. To the library, those are just integers that it is swapping between and then calling evaluation on. In some ways this is annoying, because it won't automatically simplify as you point out (though it does do some very basic simplifications, though these are turned off for TemplateExpression), but in other ways, this is nice, because it means you can input any function you want and it will be used during the evolution. Basically, SR.jl imposes no prior on operators; and no operators get special treatment. The underlying algorithm is operator agnostic.

That being said, it will still try to minimize complexity and loss. So it will naturally trend towards simpler expressions that evaluate to the same thing. Note: that simplification happens via evolution, not via handwritten simplification rules.

In your hall of fame, I can see that the complexity 18 expression is slightly worse in performance than the complexity 20 one. So this satisfies the rules of the dominating pareto front. An expression is only shown if it is better performance than all simpler expressions. And that's what you see.

Now, sometimes it can't seem to get the same performance for an expression that should simplify to the same thing. I think this is sometimes (?) due to numerical precision issues. Like how the following is true:

julia> (0.1 + 0.2) - 0.3
5.551115123125783e-17

julia> 0.1 + (0.2 - 0.3)
2.7755575615628914e-17

However, it could also be because of Optim.jl exiting early, and maybe it's something we could actually try to fix? Would be good to know if there's a way to prevent this sorta thing. I would also love for this type of issue to not occur.

Moelf · 2025-02-14T19:28:32Z

thanks for the very detailed answer, I think that covers both "why it happens" and how should we factor this into consideration for our application, we're basically something similar to your paper https://arxiv.org/abs/2411.09851 (specifically, di-jet function), but I didn't know about that paper until today!

though it does do some very basic simplifications, though these are turned off for TemplateExpression

does this mean should_simplify=true has no effect? That would certainly explain what we're seeing here. On the side, Template Expression appears to only happened after that paper, maybe this is a follow up study I can do within my group.

MilesCranmer · 2025-02-14T19:48:33Z

Cool! Looking forward to reading it :) And yeah the different expression types were released after Ho Fung's project; my hope is that the expression types like TemplateExpression will make similar types of workflows will be easier, since this puts the parametric optimization and functional form directly into the search itself.

Moelf · 2025-02-14T19:50:15Z

I think conceptually it should be possible to do should_simplify inside each component of the Template Expression, maybe I should open an issue to track that?

MilesCranmer · 2025-02-14T19:53:18Z

Sure sounds good.

Also, just for posterity, you would write the above code with the new syntax as:

atlas_template = @template_spec(expressions=(B, E)) do x
    B(x)^E(log(x))
end

sr_model = SRRegressor(;
    expression_spec=atlas_template,
)

Or, with parameters, like

atlas_template = @template_spec(expressions=(B, E), parameters=(p=1,)) do x
    B(x)^E(log(x)) + p[1]
end

Moelf · 2025-02-14T19:56:31Z

Ah right I guess a breaking change may be coming.

I did just now adopt the non-macro approach from the documentation:

atlas_template = TemplateStructure{(:B, :E)}(
        ((; B, E), (x,)) -> B(x)^E(log(x))
    )

...
expression_spec = TemplateExpressionSpec(; structure = atlas_template)
...

MilesCranmer · 2025-02-14T20:06:43Z

Oh the old syntax will still work (and continue to work for the foreseeable future). If you @macroexpand the macro it shows the same thing. Just a bit less typing.

Same for passing TemplateStructure and TemplateExpression to the options or SRRegressor. (I don't like to make breaking changes unless completely unavoidable)

Moelf closed this as completed Dec 9, 2024

Moelf reopened this Feb 14, 2025

Moelf closed this as completed Feb 14, 2025

Moelf mentioned this issue Feb 15, 2025

should_simplify when using Template Expression #419

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: More constant symplification? #381

[Feature]: More constant symplification? #381

Moelf commented Dec 8, 2024 •

edited

Loading

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

Moelf commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 9, 2024

Moelf commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025

Moelf commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025 •

edited

Loading

Moelf commented Feb 14, 2025

MilesCranmer commented Feb 14, 2025 •

edited

Loading

Moelf commented Feb 14, 2025

MilesCranmer commented Feb 14, 2025

Moelf commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025 •

edited

Loading

[Feature]: More constant symplification? #381

[Feature]: More constant symplification? #381

Comments

Moelf commented Dec 8, 2024 • edited Loading

Feature Request

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

Moelf commented Dec 8, 2024

MilesCranmer commented Dec 8, 2024

MilesCranmer commented Dec 9, 2024

Moelf commented Feb 14, 2025 • edited Loading

MilesCranmer commented Feb 14, 2025

Moelf commented Feb 14, 2025 • edited Loading

MilesCranmer commented Feb 14, 2025 • edited Loading

Moelf commented Feb 14, 2025

MilesCranmer commented Feb 14, 2025 • edited Loading

Moelf commented Feb 14, 2025

MilesCranmer commented Feb 14, 2025

Moelf commented Feb 14, 2025 • edited Loading

MilesCranmer commented Feb 14, 2025 • edited Loading

Moelf commented Dec 8, 2024 •

edited

Loading

Moelf commented Feb 14, 2025 •

edited

Loading

Moelf commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025 •

edited

Loading

Moelf commented Feb 14, 2025 •

edited

Loading

MilesCranmer commented Feb 14, 2025 •

edited

Loading