You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Equation Tree package is an equation toolbox with symbolic regression in mind. It represents
4
-
expressions as incomplete binary trees and has various features tailored towards testing symbolic
5
-
regression algorithms or training models. The most notable features are:
3
+
The Equation Tree package is an equation toolbox with symbolic regression in mind. It represents expressions as an incomplete binary [equation tree](user-guide/equation-formats.md) and has various features tailored towards testing symbolic regression algorithms or training models. The most notable features are:
- Calculating [Distance Metrics](user-guide/distance-metrics.md) between equations
10
7
11
-
12
-
## Equation Sampling
13
-
14
-

15
-
16
-
In our sampling method, the equation structure and the equation content are sampled in two steps:
17
-
- (1) First, we sample the *structure* of the equation
18
-
- (2) Second, we sample the *content* of the equation
19
-
20
-
The sampling can be customized to obtain a desired equation distribution. For example, to mimic the distribution in specific scientific fields. This is customization is implemented in form of priors for operators, functions, features, and structures. We can also use conditional priors conditioned on the parent node.
21
-
22
-
## Feature Extraction
23
-
24
-
Given an equation, our package can extract features like number of constants, and variables, and various equation complexity measurements (For example, number of nodes and tree depth.)
25
-
26
-
For a list of equations, our package is capable to easily access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.
27
-
28
-
## Distance Metrics
29
-
30
-
For benchmarking or training, the Equation Tree package features a list of distance metrics between equations:
31
-
32
-
-**Prediction distance.** Prediction distance between function values as proposed byLa Cava et al. (2021):
33
-
-**Symbolic solution.** Another metric proposed by La Cava et al. (2021) is called symbolic solution, designed to capture SR models that differ from the true model by a constant or scalar. In our application, we define the symbolic constant difference as:
34
-
-**Normalized edit distance.** In addition to the metrics above, Matsubara et al. (2022) propose a normalized edit distance for the trees. For a pair of two trees, edit distance computes the minimum cost to transform one to another with a sequence of operations, each of which either 1) inserts, 2) deletes, or 3) renames a node.
8
+
It also encompasses a variety of [additional features](user-guide/additional-features.md). For example, to obtain information about existing equation list that can, in turn, be used in our sampling method.
35
9
36
10
## Relevant Publication
37
11
38
-
For reference and informations about the evaluation of our package, read our Neuroips 2023 paper:
39
-
40
-
Marinescu*, I., Strittmatter*, Y, Williams, C, Musslick, S. "Expression Sampler as a Dynamic Benchmark for Symbolic Regression." In *NeurIPS 2023 AI for Science Workshop*. (2023), [Read the publication](https://openreview.net/forum?id=i3PecpoiPG). [*equal contribution]
41
-
12
+
For reference and information about the evaluation of our package, read our NeuroIPS 2023 [paper](https://openreview.net/forum?id=i3PecpoiPG):
42
13
14
+
Marinescu\*, I., Strittmatter\*, Y, Williams, C, Musslick, S. "Expression Sampler as a Dynamic Benchmark for Symbolic Regression." In *NeurIPS 2023 AI for Science Workshop*. (2023), . [*equal contribution]
43
15
44
16
## About
45
17
@@ -51,10 +23,6 @@ PI: <a href="https://smusslick.com/">Sebastian Musslick</a>. This research progr
51
23
Schmidt Science Fellows, in partnership with the Rhodes Trust, as well as the Carney BRAINSTORM
52
24
program at Brown University.
53
25
54
-
## References
55
-
56
-
La Cava, W. G., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. "Contemporary Symbolic Regression Methods and their Relative Performance." In *CoRR* (2021), Available at: [https://arxiv.org/abs/2107.14351](https://arxiv.org/abs/2107.14351)
57
26
58
-
Matsubara, Y., Chiba, N., Igarashi, R., & Ushiku, Y. "Rethinking symbolic regression datasets and benchmarks for scientific discovery." In *arXiv preprint arXiv:2206.10540*. (2022), Available at: [https://arxiv.org/abs/2206.10540](https://arxiv.org/abs/2206.10540)
Copy file name to clipboardExpand all lines: docs/tutorials/Basic Usage.ipynb
+42-7Lines changed: 42 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,13 @@
8
8
},
9
9
"source": [
10
10
"# Basic Usage\n",
11
-
"Here, we demonstrate core functionalities of the Equation Tree:\n",
11
+
"\n",
12
+
"Content:\n",
12
13
"- Basic Functionality for sampling and processing equations\n",
13
14
"- Advanced settings for sampling equations\n",
14
15
"\n",
15
-
"## Installation"
16
+
"## Installation\n",
17
+
"The Equation Tree package is available on [pyPI](https://pypi.org/project/equation-tree/):"
16
18
]
17
19
},
18
20
{
@@ -33,7 +35,7 @@
33
35
"\n",
34
36
"### Sampling With Default Settings\n",
35
37
"First, we need to import the functionality.\n",
36
-
"Here we also set a seed to ensure reproducible results."
38
+
"Here, we also set a seed to ensure reproducible results."
37
39
],
38
40
"metadata": {
39
41
"collapsed": false
@@ -232,7 +234,7 @@
232
234
"\n",
233
235
"### Evaluating Equations\n",
234
236
"\n",
235
-
"After instantiating equations, we can evaluate them arbitrary input:"
237
+
"After instantiating equations, we can evaluate them on arbitrary input:"
236
238
],
237
239
"metadata": {
238
240
"collapsed": false
@@ -279,7 +281,7 @@
279
281
"\n",
280
282
"### Input Dimensions\n",
281
283
"\n",
282
-
"We can manipulate the space on witch the equation is defined. For example, if we want equations that are defined on 2-dimensions, we can write:"
284
+
"We can manipulate the space on witch the equations are defined. For example, if we want equations that are defined on 2-dimensions, we can write:"
283
285
],
284
286
"metadata": {
285
287
"collapsed": false
@@ -310,11 +312,11 @@
310
312
{
311
313
"cell_type": "markdown",
312
314
"source": [
313
-
"*Note, not all the equations have exactly 2 input variable. Some of them have only one. This is since equations with only one input variable are still defined on 2 (or more dimensions)*\n",
315
+
"*Note, not all the equations have exactly two input variable. Some of them have only one. This is since equations with one input variable are still defined on two (or more) dimensions.*\n",
314
316
"\n",
315
317
"### Equation Complexity\n",
316
318
"\n",
317
-
"We can also manipulate the equation complexity (as number of nodes)"
319
+
"We can also manipulate the equation complexity (for example, as tree depth):"
318
320
],
319
321
"metadata": {
320
322
"collapsed": false
@@ -344,6 +346,39 @@
344
346
"collapsed": false
345
347
}
346
348
},
349
+
{
350
+
"cell_type": "markdown",
351
+
"source": [
352
+
"Instead of an exact depth, we can also sample all equations up to a specified depth:"
0 commit comments