-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dev documentation on everest vs ert data models #9820
base: main
Are you sure you want to change the base?
Add dev documentation on everest vs ert data models #9820
Conversation
CodSpeed Performance ReportMerging #9820 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle correct. But some amendments.
docs/everest/development.rst
Outdated
The mapping from data models in EVEREST to ERT is the same as flattening a 2D array (i.e., from a `<GEO_ID>` and `pertubation` based index in EVEREST to | ||
`realization` in ERT). | ||
|
||
Explicitly this means: | ||
|
||
.. math:: | ||
|
||
r(g, p) = g, | ||
|
||
if `batch` only has `unperturbed controls`, | ||
|
||
.. math:: | ||
|
||
r(g, p) = p + g * P, | ||
|
||
if `batch` only has `perturbed controls`, | ||
|
||
.. math:: | ||
|
||
r(g, p) = g * (p<0) + (p + g * P + G) (p>=0), | ||
|
||
if `batch` has `unperturbed` and `perturbed controls`, where `r` is the ERT `realization_id` (0, ..., `R` - 1), `g` is the `<GEO_ID>` (0, ..., `G` - 1), `p` is `pertubation_id` (-1, 0, ..., `P` - 1), `R` | ||
is the total number of ERT `realizations`, `G` is the total number of static `model_realizations`, `P` is the total number of pertubations. | ||
NOTE: `p = -1` for `unperturbed controls`, and `p = 0, ..., P - 1` for `perturbed controls`. | ||
**THIS IS MY SUGGESTION AND CURRENTLY NOT HOW IT WORKS AND ONLY VALID FOR GRADIENT BASED OPTIMIZATION ALGORITHMS I GUESS? | ||
If we don't want `p` to be negative we need to use a flag (e.g., `is_pertubation`)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct for gradient based algorithms, where we assume there may either a single control vector for function evaluation, a set of perturbed controls for gradient evaluation, or both.
Discrete optimizers will not use perturbed controls, but may pass multiple unperturbed controls if they support parallel evaluation (which they usually do and is very important).
So we do need to support identifying multiple function evaluations. I don't see a use case for supporting multiple gradient evaluations concurrently, but who knows...
|
||
Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`. | ||
A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed` | ||
or `perturbed controls`. `<GEO_ID>` is continuous from one `batch` to the next since they are not changing at all over the course of the optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note however, that in some cases the optimizer may decide that some realizations are not needed. These will then be skipped, and simulation numbering will be different (since there are less realizations). In such a case, the mapping using the math above will break.
… being discussed on how it's implemented)
|
||
EVEREST vs. ERT data models | ||
=========================== | ||
EVEREST uses ERT for running an experiment, but instead of submitting an `ensemble` (ERT) to the queue we submit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would compress the intro a bit more and focus a bit more on technical ERT-side, since Everest mostly uses ERT to submit a forward model, and store its results. I'm not sure if the history matching vs optimization is relevant in this developer-facing context.
The main point I think is that Everest uses ERT by (1) mapping controls to parameters which are forward model inputs, (2) running the forward model and having the results stored by ERT, (3) reading the results of the forward model run and mapping them to objectives/constraints. In this context, a single realization run corresponds to one run of the forward model, one set of inputs and outputs.
We can say this part without introducing history matching vs optimization, ensemble vs batch etc beforehand.
Then afterwards we can introduce the batch vs ensembles as you do
but they have some hierarchical differences in terms of the meaning behind the data. | ||
ERT history matches `realizations` (i.e., `model parameters`) to data, hence an `ensemble` contains a number of `realizations`. | ||
EVEREST optimizes a set of `controls` and assumes static (i.e., unchanging) `realizations`. | ||
In terms of collecting the results of forward model runs, there is a distinction between `unperturbed controls` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a brief summary of the control flow of Everest wrt running batches would be good to lead into this. i.e., what Everest decides to put in a batch and why. AFAIK it is either to evaluate an objective and/or generate a gradient. I think you are explaining it quite nicely below here.
Different meaning of `realization` and `simulation`. | ||
|
||
As is evident from the image above, in terms of execution in the queue `realization` (ERT) and `simulation` (EVEREST) are synonymous. | ||
This means that ERT queue system is agnostic about the meaning of each run only when the data is collected back in EVEREST (`GEN_DATA`) is meaning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think wrt my first comment, this info may be stated at the start and thus be redundant here.
should reflect this (i.e., less `<GEO_ID>` in the batch results than expected). | ||
|
||
Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`. | ||
A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find Fig 5? I'm not exactly sure what this sentence is trying to say? That there are different mappings between (geoid, perturbation) and (ert realization)? I think wrt #9767 , we will receive this mapping from ROPT, so it might be sufficient to say that ROPT will provide that mapping and we will attach that info to the stored forward model results in ERT.
Issue
Resolves #9619
Approach
Attempts to clarify the differences between ERT and EVEREST data models. There are things in here that need revision (like the mapping, these equations work but not sure if it makes sense in all cases of optimization meaning gradient vs. derrivative free methods etc.). Also, with Yngve we suggest a couple of name changes:
simulation
->ert_realization
(at least in the code, since they are synonymous)geo_realization
->model_realization
(or justrealization
towards the user, we are not just concerned with GEO apps?)<GEO_ID>
- ><STATIC_MODEL_ID>
or<MODEL_REALIZATION_ID>
(Screenshot of new behavior in GUI if applicable)
git rebase -i main --exec 'pytest tests/ert/unit_tests -n auto --hypothesis-profile=fast -m "not integration_test"'
)When applicable