Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dev documentation on everest vs ert data models #9820

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/everest/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ client component.
:width: 700px
:alt: Everest architecture

Everest architecture

Every time an optimization instance is ran by a user, the client component of the
application spawns an instance of the server component, which is started either on a
cluster node using LSF (when the `queue_system` is defined to be *lsf*) or on the
Expand Down Expand Up @@ -52,3 +54,54 @@ long as the optimization process is running.
* - POST
- '/stop'
- Signal everest optimization run termination. It will be called by the client when the optimization needs to be terminated in the middle of the run


EVEREST vs. ERT data models
===========================
EVEREST uses ERT for running an experiment, but instead of submitting an `ensemble` (ERT) to the queue we submit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would compress the intro a bit more and focus a bit more on technical ERT-side, since Everest mostly uses ERT to submit a forward model, and store its results. I'm not sure if the history matching vs optimization is relevant in this developer-facing context.

The main point I think is that Everest uses ERT by (1) mapping controls to parameters which are forward model inputs, (2) running the forward model and having the results stored by ERT, (3) reading the results of the forward model run and mapping them to objectives/constraints. In this context, a single realization run corresponds to one run of the forward model, one set of inputs and outputs.

We can say this part without introducing history matching vs optimization, ensemble vs batch etc beforehand.

Then afterwards we can introduce the batch vs ensembles as you do

a `batch` in EVEREST. `Batches` are in principle very similar to `ensembles`, ERT queue system doesn't treat them differently,
but they have some hierarchical differences in terms of the meaning behind the data.
ERT history matches `realizations` (i.e., `model parameters`) to data, hence an `ensemble` contains a number of `realizations`.
EVEREST optimizes a set of `controls` and assumes static (i.e., unchanging) `realizations`.
In terms of collecting the results of forward model runs, there is a distinction between `unperturbed controls`
Copy link
Contributor

@yngve-sk yngve-sk Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a brief summary of the control flow of Everest wrt running batches would be good to lead into this. i.e., what Everest decides to put in a batch and why. AFAIK it is either to evaluate an objective and/or generate a gradient. I think you are explaining it quite nicely below here.

(i.e., current `objective function` value) and `perturbed controls` (i.e., required to calculate the `gradient`).
Furthermore, when performing robust optimization (i.e., multiple static `realizations`) a `batch` contains a
certain number of `realizations` (denoted by `<GEO_ID>`) and each `realization` contains a number of `simulations`
(i.e., forward model runs). These `simulations` are forward model runs for either `unperturbed controls` and/or
`perturbed controls`. This is the key differences between the hierarchical data model of EVEREST and ERT (Fig 3).

.. figure:: images/Everest_vs_Ert_01.png
:align: center
:width: 700px
:alt: EVEREST vs. ERT data models

Difference between `ensemble` in ERT and `batch` in EVEREST.

.. figure:: images/Everest_vs_Ert_02.png
:align: center
:width: 700px
:alt: Additional explanation of Fig 3

Different meaning of `realization` and `simulation`.

As is evident from the image above, in terms of execution in the queue `realization` (ERT) and `simulation` (EVEREST) are synonymous.
This means that ERT queue system is agnostic about the meaning of each run only when the data is collected back in EVEREST (`GEN_DATA`) is meaning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wrt my first comment, this info may be stated at the start and thus be redundant here.

of each run attributed.
The mapping from data models in EVEREST and ERT is done in the `ropt` library, it maps from `realization` (ERT) to `<GEO_ID>` and `pertubation` (EVEREST) and vice versa.
`Batches` in EVEREST can contain several different configurations depending on the algorithm used. Gradient-based algorithms can have a single function
evaluation (`unperturbed controls`) per `<GEO_ID>`, a set of `perturbed controls` per `<GEO_ID>` to evaluate the gradient, or both.
Derivative-free methods can have several function evaluations per `<GEO_ID>` and no `perturbed controls`.
**NOTE:** the optimizer may decide that some `<GEO_ID>` are not needed, these are then skipped and the mapping from `ropt`
should reflect this (i.e., less `<GEO_ID>` in the batch results than expected).

Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`.
A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find Fig 5? I'm not exactly sure what this sentence is trying to say? That there are different mappings between (geoid, perturbation) and (ert realization)? I think wrt #9767 , we will receive this mapping from ROPT, so it might be sufficient to say that ROPT will provide that mapping and we will attach that info to the stored forward model results in ERT.

or `perturbed controls`. `<GEO_ID>` is continuous from one `batch` to the next since they are not changing at all over the course of the optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note however, that in some cases the optimizer may decide that some realizations are not needed. These will then be skipped, and simulation numbering will be different (since there are less realizations). In such a case, the mapping using the math above will break.


.. figure:: images/Everest_vs_Ert_03.png
:align: center
:width: 700px
:alt: Other `batch` configurations EVEREST

Three other possible configurations of EVEREST `batches` in the context of gradient-based (i.e., `optpp_q_newton`)
and gradient-free (i.e., **WHICH ONE DO WE SUPPORT?**) optimization algorithms.
Binary file added docs/everest/images/Everest_vs_Ert_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/everest/images/Everest_vs_Ert_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/everest/images/Everest_vs_Ert_03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading