-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dev documentation on everest vs ert data models #9820
base: main
Are you sure you want to change the base?
Changes from all commits
5cfe15b
0682e1b
de52b1f
85e9b55
0c0f820
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ client component. | |
:width: 700px | ||
:alt: Everest architecture | ||
|
||
Everest architecture | ||
|
||
Every time an optimization instance is ran by a user, the client component of the | ||
application spawns an instance of the server component, which is started either on a | ||
cluster node using LSF (when the `queue_system` is defined to be *lsf*) or on the | ||
|
@@ -52,3 +54,54 @@ long as the optimization process is running. | |
* - POST | ||
- '/stop' | ||
- Signal everest optimization run termination. It will be called by the client when the optimization needs to be terminated in the middle of the run | ||
|
||
|
||
EVEREST vs. ERT data models | ||
=========================== | ||
EVEREST uses ERT for running an experiment, but instead of submitting an `ensemble` (ERT) to the queue we submit | ||
a `batch` in EVEREST. `Batches` are in principle very similar to `ensembles`, ERT queue system doesn't treat them differently, | ||
but they have some hierarchical differences in terms of the meaning behind the data. | ||
ERT history matches `realizations` (i.e., `model parameters`) to data, hence an `ensemble` contains a number of `realizations`. | ||
EVEREST optimizes a set of `controls` and assumes static (i.e., unchanging) `realizations`. | ||
In terms of collecting the results of forward model runs, there is a distinction between `unperturbed controls` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a brief summary of the control flow of Everest wrt running batches would be good to lead into this. i.e., what Everest decides to put in a batch and why. AFAIK it is either to evaluate an objective and/or generate a gradient. I think you are explaining it quite nicely below here. |
||
(i.e., current `objective function` value) and `perturbed controls` (i.e., required to calculate the `gradient`). | ||
Furthermore, when performing robust optimization (i.e., multiple static `realizations`) a `batch` contains a | ||
certain number of `realizations` (denoted by `<GEO_ID>`) and each `realization` contains a number of `simulations` | ||
(i.e., forward model runs). These `simulations` are forward model runs for either `unperturbed controls` and/or | ||
`perturbed controls`. This is the key differences between the hierarchical data model of EVEREST and ERT (Fig 3). | ||
|
||
.. figure:: images/Everest_vs_Ert_01.png | ||
:align: center | ||
:width: 700px | ||
:alt: EVEREST vs. ERT data models | ||
|
||
Difference between `ensemble` in ERT and `batch` in EVEREST. | ||
|
||
.. figure:: images/Everest_vs_Ert_02.png | ||
:align: center | ||
:width: 700px | ||
:alt: Additional explanation of Fig 3 | ||
|
||
Different meaning of `realization` and `simulation`. | ||
|
||
As is evident from the image above, in terms of execution in the queue `realization` (ERT) and `simulation` (EVEREST) are synonymous. | ||
This means that ERT queue system is agnostic about the meaning of each run only when the data is collected back in EVEREST (`GEN_DATA`) is meaning | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think wrt my first comment, this info may be stated at the start and thus be redundant here. |
||
of each run attributed. | ||
The mapping from data models in EVEREST and ERT is done in the `ropt` library, it maps from `realization` (ERT) to `<GEO_ID>` and `pertubation` (EVEREST) and vice versa. | ||
`Batches` in EVEREST can contain several different configurations depending on the algorithm used. Gradient-based algorithms can have a single function | ||
evaluation (`unperturbed controls`) per `<GEO_ID>`, a set of `perturbed controls` per `<GEO_ID>` to evaluate the gradient, or both. | ||
Derivative-free methods can have several function evaluations per `<GEO_ID>` and no `perturbed controls`. | ||
**NOTE:** the optimizer may decide that some `<GEO_ID>` are not needed, these are then skipped and the mapping from `ropt` | ||
should reflect this (i.e., less `<GEO_ID>` in the batch results than expected). | ||
|
||
Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`. | ||
A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't find Fig 5? I'm not exactly sure what this sentence is trying to say? That there are different mappings between (geoid, perturbation) and (ert realization)? I think wrt #9767 , we will receive this mapping from ROPT, so it might be sufficient to say that ROPT will provide that mapping and we will attach that info to the stored forward model results in ERT. |
||
or `perturbed controls`. `<GEO_ID>` is continuous from one `batch` to the next since they are not changing at all over the course of the optimization. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note however, that in some cases the optimizer may decide that some realizations are not needed. These will then be skipped, and simulation numbering will be different (since there are less realizations). In such a case, the mapping using the math above will break. |
||
|
||
.. figure:: images/Everest_vs_Ert_03.png | ||
:align: center | ||
:width: 700px | ||
:alt: Other `batch` configurations EVEREST | ||
|
||
Three other possible configurations of EVEREST `batches` in the context of gradient-based (i.e., `optpp_q_newton`) | ||
and gradient-free (i.e., **WHICH ONE DO WE SUPPORT?**) optimization algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would compress the intro a bit more and focus a bit more on technical ERT-side, since Everest mostly uses ERT to submit a forward model, and store its results. I'm not sure if the history matching vs optimization is relevant in this developer-facing context.
The main point I think is that Everest uses ERT by (1) mapping controls to parameters which are forward model inputs, (2) running the forward model and having the results stored by ERT, (3) reading the results of the forward model run and mapping them to objectives/constraints. In this context, a single realization run corresponds to one run of the forward model, one set of inputs and outputs.
We can say this part without introducing history matching vs optimization, ensemble vs batch etc beforehand.
Then afterwards we can introduce the batch vs ensembles as you do