-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: More constant symplification? #381
Comments
You likely just need to set |
But in general there will be some points when the expression has redundancies like this. The genetic algorithm should eventually simplify it based on a lower complexity. |
Also - we don't want to necessarily force simplified expressions. This is because mutations are single-step, so to get to |
It is already set to true |
Oh in the code you sent, it isn't set at all: julia> print(sr_model2.should_simplify)
nothing It will be set to false during the search based on the other parameters set. Maybe you meant |
(Is this ok to close? Or any issues with it?) |
atlas_template = TemplateStructure{(:B, :E)}(
((; B, E), (x,)) -> B(x)^E(log(x))
)
sr_model = SRRegressor(;
niterations=300,
binary_operators=[+, *],
should_simplify=true,
seed=20241206,
expression_type=TemplateExpression,
expression_options=(; structure=atlas_template),
loss_function_expression=chi2_sr_functor(errs),
deterministic=true,
parallelism=:serial,
kw...
)
I'm seeing this re-surface (see Complexity 14 which should be further simplified into a |
The simplified version of that expression is already stored in complexity 10, right? |
yeah, and for that matter, Complexity 8 is a simplification of Complexity 10. But I do expect 2-nd order polynomial to show up at some point. In this case, it shows up down the list: So this is the intended behavior? That the fundamentally the same equation will appear multiple times in HoF each with one simplification step thus a different complexity? |
SymbolicRegression.jl fundamentally doesn't "understand" what operators you pass it. To the library, those are just integers that it is swapping between and then calling evaluation on. In some ways this is annoying, because it won't automatically simplify as you point out (though it does do some very basic simplifications, though these are turned off for TemplateExpression), but in other ways, this is nice, because it means you can input any function you want and it will be used during the evolution. Basically, SR.jl imposes no prior on operators; and no operators get special treatment. The underlying algorithm is operator agnostic. That being said, it will still try to minimize complexity and loss. So it will naturally trend towards simpler expressions that evaluate to the same thing. Note: that simplification happens via evolution, not via handwritten simplification rules. In your hall of fame, I can see that the complexity 18 expression is slightly worse in performance than the complexity 20 one. So this satisfies the rules of the dominating pareto front. An expression is only shown if it is better performance than all simpler expressions. And that's what you see. Now, sometimes it can't seem to get the same performance for an expression that should simplify to the same thing. I think this is sometimes (?) due to numerical precision issues. Like how the following is true: julia> (0.1 + 0.2) - 0.3
5.551115123125783e-17
julia> 0.1 + (0.2 - 0.3)
2.7755575615628914e-17 However, it could also be because of Optim.jl exiting early, and maybe it's something we could actually try to fix? Would be good to know if there's a way to prevent this sorta thing. I would also love for this type of issue to not occur. |
thanks for the very detailed answer, I think that covers both "why it happens" and how should we factor this into consideration for our application, we're basically something similar to your paper https://arxiv.org/abs/2411.09851 (specifically, di-jet function), but I didn't know about that paper until today!
does this mean |
Cool! Looking forward to reading it :) And yeah the different expression types were released after Ho Fung's project; my hope is that the expression types like TemplateExpression will make similar types of workflows will be easier, since this puts the parametric optimization and functional form directly into the search itself. |
I think conceptually it should be possible to do |
Sure sounds good. Also, just for posterity, you would write the above code with the new syntax as: atlas_template = @template_spec(expressions=(B, E)) do x
B(x)^E(log(x))
end
sr_model = SRRegressor(;
expression_spec=atlas_template,
) Or, with parameters, like atlas_template = @template_spec(expressions=(B, E), parameters=(p=1,)) do x
B(x)^E(log(x)) + p[1]
end |
Ah right I guess a breaking change may be coming. I did just now adopt the non-macro approach from the documentation: atlas_template = TemplateStructure{(:B, :E)}(
((; B, E), (x,)) -> B(x)^E(log(x))
)
...
expression_spec = TemplateExpressionSpec(; structure = atlas_template)
... |
Oh the old syntax will still work (and continue to work for the foreseeable future). If you Same for passing TemplateStructure and TemplateExpression to the options or SRRegressor. (I don't like to make breaking changes unless completely unavoidable) |
Feature Request
log(162.43)
shouldn't uselog
at all (this increase the complexity of the expression unnecessarily)The text was updated successfully, but these errors were encountered: