Skip to content

Commit 8bb4b9a

Browse files
author
Younes Strittmatter
authored
Merge pull request #27 from AutoResearch/add-standard-operators-and_functions
chore/docs: add documentation on metrics/refactor package
2 parents 0062daf + f48703b commit 8bb4b9a

19 files changed

Lines changed: 1132 additions & 304 deletions

docs/index.md

Lines changed: 6 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,17 @@
11
# Equation Tree
22

3-
The Equation Tree package is an equation toolbox with symbolic regression in mind. It represents
4-
expressions as incomplete binary trees and has various features tailored towards testing symbolic
5-
regression algorithms or training models. The most notable features are:
3+
The Equation Tree package is an equation toolbox with symbolic regression in mind. It represents expressions as an incomplete binary [equation tree](user-guide/equation-formats.md) and has various features tailored towards testing symbolic regression algorithms or training models. The most notable features are:
64

7-
- Equation sampling (including priors)
8-
- Feature Extraction from equation distributions
9-
- Distance metrics between equations
5+
- [**Equation Sampling**](user-guide/equation-sampling.md)
6+
- Calculating [Distance Metrics](user-guide/distance-metrics.md) between equations
107

11-
12-
## Equation Sampling
13-
14-
![Equation Tree](img/equation-sampler.gif)
15-
16-
In our sampling method, the equation structure and the equation content are sampled in two steps:
17-
- (1) First, we sample the *structure* of the equation
18-
- (2) Second, we sample the *content* of the equation
19-
20-
The sampling can be customized to obtain a desired equation distribution. For example, to mimic the distribution in specific scientific fields. This is customization is implemented in form of priors for operators, functions, features, and structures. We can also use conditional priors conditioned on the parent node.
21-
22-
## Feature Extraction
23-
24-
Given an equation, our package can extract features like number of constants, and variables, and various equation complexity measurements (For example, number of nodes and tree depth.)
25-
26-
For a list of equations, our package is capable to easily access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.
27-
28-
## Distance Metrics
29-
30-
For benchmarking or training, the Equation Tree package features a list of distance metrics between equations:
31-
32-
- **Prediction distance.** Prediction distance between function values as proposed byLa Cava et al. (2021):
33-
- **Symbolic solution.** Another metric proposed by La Cava et al. (2021) is called symbolic solution, designed to capture SR models that differ from the true model by a constant or scalar. In our application, we define the symbolic constant difference as:
34-
- **Normalized edit distance.** In addition to the metrics above, Matsubara et al. (2022) propose a normalized edit distance for the trees. For a pair of two trees, edit distance computes the minimum cost to transform one to another with a sequence of operations, each of which either 1) inserts, 2) deletes, or 3) renames a node.
8+
It also encompasses a variety of [additional features](user-guide/additional-features.md). For example, to obtain information about existing equation list that can, in turn, be used in our sampling method.
359

3610
## Relevant Publication
3711

38-
For reference and informations about the evaluation of our package, read our Neuroips 2023 paper:
39-
40-
Marinescu*, I., Strittmatter*, Y, Williams, C, Musslick, S. "Expression Sampler as a Dynamic Benchmark for Symbolic Regression." In *NeurIPS 2023 AI for Science Workshop*. (2023), [Read the publication](https://openreview.net/forum?id=i3PecpoiPG). [*equal contribution]
41-
12+
For reference and information about the evaluation of our package, read our NeuroIPS 2023 [paper](https://openreview.net/forum?id=i3PecpoiPG):
4213

14+
Marinescu\*, I., Strittmatter\*, Y, Williams, C, Musslick, S. "Expression Sampler as a Dynamic Benchmark for Symbolic Regression." In *NeurIPS 2023 AI for Science Workshop*. (2023), . [*equal contribution]
4315

4416
## About
4517

@@ -51,10 +23,6 @@ PI: <a href="https://smusslick.com/">Sebastian Musslick</a>. This research progr
5123
Schmidt Science Fellows, in partnership with the Rhodes Trust, as well as the Carney BRAINSTORM
5224
program at Brown University.
5325

54-
## References
55-
56-
La Cava, W. G., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. "Contemporary Symbolic Regression Methods and their Relative Performance." In *CoRR* (2021), Available at: [https://arxiv.org/abs/2107.14351](https://arxiv.org/abs/2107.14351)
5726

58-
Matsubara, Y., Chiba, N., Igarashi, R., & Ushiku, Y. "Rethinking symbolic regression datasets and benchmarks for scientific discovery." In *arXiv preprint arXiv:2206.10540*. (2022), Available at: [https://arxiv.org/abs/2206.10540](https://arxiv.org/abs/2206.10540)
5927

6028

docs/quickstart.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,11 @@ You will need:
44

55
- `python` 3.8 or greater: [https://www.python.org/downloads/](https://www.python.org/downloads/)
66

7-
7+
The package is available as pypi package:
88
```shell
9-
pip install -U equation-tree
9+
pip install equation-tree
1010
```
1111

12-
1312
Check your installation by running:
1413
```shell
1514
python -c "from equation_tree import EquationTree"
Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,13 @@
88
},
99
"source": [
1010
"# Basic Usage\n",
11-
"Here, we demonstrate core functionalities of the Equation Tree:\n",
11+
"\n",
12+
"Content:\n",
1213
"- Basic Functionality for sampling and processing equations\n",
1314
"- Advanced settings for sampling equations\n",
1415
"\n",
15-
"## Installation"
16+
"## Installation\n",
17+
"The Equation Tree package is available on [pyPI](https://pypi.org/project/equation-tree/):"
1618
]
1719
},
1820
{
@@ -33,7 +35,7 @@
3335
"\n",
3436
"### Sampling With Default Settings\n",
3537
"First, we need to import the functionality.\n",
36-
"Here we also set a seed to ensure reproducible results."
38+
"Here, we also set a seed to ensure reproducible results."
3739
],
3840
"metadata": {
3941
"collapsed": false
@@ -232,7 +234,7 @@
232234
"\n",
233235
"### Evaluating Equations\n",
234236
"\n",
235-
"After instantiating equations, we can evaluate them arbitrary input:"
237+
"After instantiating equations, we can evaluate them on arbitrary input:"
236238
],
237239
"metadata": {
238240
"collapsed": false
@@ -279,7 +281,7 @@
279281
"\n",
280282
"### Input Dimensions\n",
281283
"\n",
282-
"We can manipulate the space on witch the equation is defined. For example, if we want equations that are defined on 2-dimensions, we can write:"
284+
"We can manipulate the space on witch the equations are defined. For example, if we want equations that are defined on 2-dimensions, we can write:"
283285
],
284286
"metadata": {
285287
"collapsed": false
@@ -310,11 +312,11 @@
310312
{
311313
"cell_type": "markdown",
312314
"source": [
313-
"*Note, not all the equations have exactly 2 input variable. Some of them have only one. This is since equations with only one input variable are still defined on 2 (or more dimensions)*\n",
315+
"*Note, not all the equations have exactly two input variable. Some of them have only one. This is since equations with one input variable are still defined on two (or more) dimensions.*\n",
314316
"\n",
315317
"### Equation Complexity\n",
316318
"\n",
317-
"We can also manipulate the equation complexity (as number of nodes)"
319+
"We can also manipulate the equation complexity (for example, as tree depth):"
318320
],
319321
"metadata": {
320322
"collapsed": false
@@ -344,6 +346,39 @@
344346
"collapsed": false
345347
}
346348
},
349+
{
350+
"cell_type": "markdown",
351+
"source": [
352+
"Instead of an exact depth, we can also sample all equations up to a specified depth:"
353+
],
354+
"metadata": {
355+
"collapsed": false
356+
}
357+
},
358+
{
359+
"cell_type": "code",
360+
"execution_count": null,
361+
"outputs": [],
362+
"source": [
363+
"equations_simple = sample(n=5, max_depth=3)\n",
364+
"equations_complex = sample(n=5, max_depth=8)"
365+
],
366+
"metadata": {
367+
"collapsed": false
368+
}
369+
},
370+
{
371+
"cell_type": "code",
372+
"execution_count": null,
373+
"outputs": [],
374+
"source": [
375+
"print('*** simple equations ***\\n', equations_simple, '\\n')\n",
376+
"print('*** complex equations ***\\n', equations_complex)"
377+
],
378+
"metadata": {
379+
"collapsed": false
380+
}
381+
},
347382
{
348383
"cell_type": "markdown",
349384
"source": [

0 commit comments

Comments
 (0)