Add dev documentation on everest vs ert data models #9820

StephanDeHoop · 2025-01-21T09:52:20Z

Issue
Resolves #9619

Approach
Attempts to clarify the differences between ERT and EVEREST data models. There are things in here that need revision (like the mapping, these equations work but not sure if it makes sense in all cases of optimization meaning gradient vs. derrivative free methods etc.). Also, with Yngve we suggest a couple of name changes:

simulation -> ert_realization (at least in the code, since they are synonymous)
geo_realization -> model_realization (or just realization towards the user, we are not just concerned with GEO apps?)
<GEO_ID> - > <STATIC_MODEL_ID> or <MODEL_REALIZATION_ID>

(Screenshot of new behavior in GUI if applicable)

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n auto --hypothesis-profile=fast -m "not integration_test"')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

codspeed-hq · 2025-01-21T10:03:59Z

CodSpeed Performance Report

Merging #9820 will not alter performance

_{Comparing StephanDeHoop:add_dev_documentation_on_everest_vs_ert_data_models (0c0f820) with main (02df6ee)}

Summary

✅ 24 untouched benchmarks

verveerpj

In principle correct. But some amendments.

verveerpj · 2025-01-21T10:12:48Z

docs/everest/development.rst

+The mapping from data models in EVEREST to ERT is the same as flattening a 2D array (i.e., from a `<GEO_ID>` and `pertubation` based index in EVEREST to
+`realization` in ERT).
+
+Explicitly this means:
+
+.. math::
+
+	r(g, p) = g,
+
+if `batch` only has `unperturbed controls`,
+
+.. math::
+
+	r(g, p) = p + g * P,
+
+if `batch` only has `perturbed controls`,
+
+.. math::
+
+	r(g, p) = g * (p<0) + (p + g * P + G) (p>=0),
+
+if `batch` has `unperturbed` and `perturbed controls`, where `r` is the ERT `realization_id` (0, ..., `R` - 1), `g` is the `<GEO_ID>` (0, ..., `G` - 1), `p` is `pertubation_id` (-1, 0, ..., `P` - 1), `R`
+is the total number of ERT `realizations`, `G` is the total number of static `model_realizations`, `P` is the total number of pertubations.
+NOTE: `p = -1` for `unperturbed controls`, and `p = 0, ..., P - 1` for `perturbed controls`.
+**THIS IS MY SUGGESTION AND CURRENTLY NOT HOW IT WORKS AND ONLY VALID FOR GRADIENT BASED OPTIMIZATION ALGORITHMS I GUESS?
+If we don't want `p` to be negative we need to use a flag (e.g., `is_pertubation`)**


This is correct for gradient based algorithms, where we assume there may either a single control vector for function evaluation, a set of perturbed controls for gradient evaluation, or both.

Discrete optimizers will not use perturbed controls, but may pass multiple unperturbed controls if they support parallel evaluation (which they usually do and is very important).

So we do need to support identifying multiple function evaluations. I don't see a use case for supporting multiple gradient evaluations concurrently, but who knows...

verveerpj · 2025-01-21T10:21:58Z

docs/everest/development.rst

+
+Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`.
+A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed`
+or `perturbed controls`. `<GEO_ID>` is continuous from one `batch` to the next since they are not changing at all over the course of the optimization.


Note however, that in some cases the optimizer may decide that some realizations are not needed. These will then be skipped, and simulation numbering will be different (since there are less realizations). In such a case, the mapping using the math above will break.

… being discussed on how it's implemented)

yngve-sk · 2025-01-29T12:07:18Z

docs/everest/development.rst

+
+EVEREST vs. ERT data models
+===========================
+EVEREST uses ERT for running an experiment, but instead of submitting an `ensemble` (ERT) to the queue we submit


Would compress the intro a bit more and focus a bit more on technical ERT-side, since Everest mostly uses ERT to submit a forward model, and store its results. I'm not sure if the history matching vs optimization is relevant in this developer-facing context.

The main point I think is that Everest uses ERT by (1) mapping controls to parameters which are forward model inputs, (2) running the forward model and having the results stored by ERT, (3) reading the results of the forward model run and mapping them to objectives/constraints. In this context, a single realization run corresponds to one run of the forward model, one set of inputs and outputs.

We can say this part without introducing history matching vs optimization, ensemble vs batch etc beforehand.

Then afterwards we can introduce the batch vs ensembles as you do

yngve-sk · 2025-01-29T12:10:14Z

docs/everest/development.rst

+but they have some hierarchical differences in terms of the meaning behind the data.
+ERT history matches `realizations` (i.e., `model parameters`) to data, hence an `ensemble` contains a number of `realizations`.
+EVEREST optimizes a set of `controls` and assumes static (i.e., unchanging) `realizations`.
+In terms of collecting the results of forward model runs, there is a distinction between `unperturbed controls`


I think a brief summary of the control flow of Everest wrt running batches would be good to lead into this. i.e., what Everest decides to put in a batch and why. AFAIK it is either to evaluate an objective and/or generate a gradient. I think you are explaining it quite nicely below here.

yngve-sk · 2025-01-29T12:12:53Z

docs/everest/development.rst

+    Different meaning of `realization` and `simulation`.
+
+As is evident from the image above, in terms of execution in the queue `realization` (ERT) and `simulation` (EVEREST) are synonymous.
+This means that ERT queue system is agnostic about the meaning of each run only when the data is collected back in EVEREST (`GEN_DATA`) is meaning


I think wrt my first comment, this info may be stated at the start and thus be redundant here.

yngve-sk · 2025-01-29T12:16:43Z

docs/everest/development.rst

+should reflect this (i.e., less `<GEO_ID>` in the batch results than expected).
+
+Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`.
+A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed`


I didn't find Fig 5? I'm not exactly sure what this sentence is trying to say? That there are different mappings between (geoid, perturbation) and (ert realization)? I think wrt #9767 , we will receive this mapping from ROPT, so it might be sufficient to say that ROPT will provide that mapping and we will attach that info to the stored forward model results in ERT.

StephanDeHoop added 2 commits January 16, 2025 08:33

add difference everest ert data models in docs

5cfe15b

Simplify docs and add prelimenary equation

0682e1b

StephanDeHoop added documentation everest labels Jan 21, 2025

StephanDeHoop self-assigned this Jan 21, 2025

verveerpj reviewed Jan 21, 2025

View reviewed changes

StephanDeHoop added 3 commits January 23, 2025 09:02

Update batch hierarchy for gradient-free methods

de52b1f

Remove explicit equation and refer to ropt for mapping (this is still…

85e9b55

… being discussed on how it's implemented)

Fix error in building docs

0c0f820

yngve-sk reviewed Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dev documentation on everest vs ert data models #9820

Add dev documentation on everest vs ert data models #9820

StephanDeHoop commented Jan 21, 2025

codspeed-hq bot commented Jan 21, 2025 •

edited

Loading

verveerpj left a comment

verveerpj Jan 21, 2025

verveerpj Jan 21, 2025

yngve-sk Jan 29, 2025

yngve-sk Jan 29, 2025 •

edited

Loading

yngve-sk Jan 29, 2025

yngve-sk Jan 29, 2025

Add dev documentation on everest vs ert data models #9820

Are you sure you want to change the base?

Add dev documentation on everest vs ert data models #9820

Conversation

StephanDeHoop commented Jan 21, 2025

When applicable

codspeed-hq bot commented Jan 21, 2025 • edited Loading

CodSpeed Performance Report

Merging #9820 will not alter performance

Summary

verveerpj left a comment

Choose a reason for hiding this comment

verveerpj Jan 21, 2025

Choose a reason for hiding this comment

verveerpj Jan 21, 2025

Choose a reason for hiding this comment

yngve-sk Jan 29, 2025

Choose a reason for hiding this comment

yngve-sk Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

yngve-sk Jan 29, 2025

Choose a reason for hiding this comment

yngve-sk Jan 29, 2025

Choose a reason for hiding this comment

codspeed-hq bot commented Jan 21, 2025 •

edited

Loading

yngve-sk Jan 29, 2025 •

edited

Loading