Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions docs/user_guide/config_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ fluxsite:
walltime: 06:00:00
storage: [scratch/a00, gdata/xy11]
multiprocess: True
meorg_model_output_id: XXXXXXXX
```

### [experiment](#experiment)
Expand Down Expand Up @@ -164,14 +163,6 @@ fluxsites:

```

### [meorg_model_output_id](#meorg_model_output_id)

: **Default:** False, _optional key_. :octicons-dash-24: The unique Model Output ID from modelevaluation.org to which output files will be automatically uploaded for analysis.

A separate upload job will be submitted at the successful completion of benchcab tasks if this key is present, however, the validity is not checked by benchcab at this stage.

Note: It is the user's responsbility to ensure the model output is configured on modelevaluation.org.

## spatial

Contains settings specific to spatial tests.
Expand Down Expand Up @@ -381,6 +372,13 @@ realisations:

: **Default:** _required key, no default_. :octicons-dash-24: Specify the local checkout path of CABLE branch.

### [meorg_model_output_name](#meorg_model_output_name)

: **Default:** False :octicons-dash-24: Chosen as the model name for one of the realisations. This would be the Model Output to which output files will be automatically uploaded for analysis. The user must set only one of the realisations keys as `true` for the name to be chosen.

Note: It is the user's responsbility to ensure the model output name does not clash with existing names belonging to other users on modelevaluation.org.


### [name](#name)

: **Default:** base name of [branch_path](#+repo.svn.branch_path) if an SVN repository is given; the branch name if a git repository is given; the folder name if a local path is given, _optional key_. :octicons-dash-24: An alias name used internally by `benchcab` for the branch. The `name` key also specifies the directory name of the source code when retrieving from SVN, GitHub or local.
Expand Down Expand Up @@ -506,7 +504,7 @@ codecov:

## meorg_bin

: **Default:** False, _optional key. :octicons-dash-24: Specifies the absolute system path to the ME.org client executable. In the absence of this key it will be inferred from the same directory as benchcab should `meorg_model_output_id` be set in `fluxsite` above.
: **Default:** False, _optional key. :octicons-dash-24: Specifies the absolute system path to the ME.org client executable. In the absence of this key it will be inferred from the same directory as benchcab should `meorg_model_output_name` be set in `realisations` above.

``` yaml

Expand Down
40 changes: 17 additions & 23 deletions docs/user_guide/use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,26 +35,24 @@ realisations:
- repo:
git:
branch: main
model_output_name: True # (1)
- repo:
git:
branch: XXXXX
patch: # (1)
patch: # (2)
cable:
cable_user:
existing_feature: YYYY

fluxsite:
meorg_model_output_id: ZZZZ # (2)

modules: [
intel-compiler/2021.1.1,
netcdf/4.7.4,
openmpi/4.1.0
]
```

1. Use the option names and values as implemented in the cable namelist file.
2. You need to setup your environment for meorg_client before using this feature.
1. You need to setup your environment for meorg_client before using this feature.
2. Use the option names and values as implemented in the cable namelist file.

The evaluation results will be on modelevaluation.org accessible from the Model Output page you've specified

Expand All @@ -74,17 +72,15 @@ realisations:
cable:
cable_user:
existing_feature: YYYY
model_output_name: True # (2)
- repo:
git:
branch: XXXXX
patch: # (2)
patch: # (3)
cable:
cable_user:
existing_feature: YYYY

fluxsite:
meorg_model_output_id: ZZZZ # (3)

modules: [
intel-compiler/2021.1.1,
netcdf/4.7.4,
Expand All @@ -93,8 +89,8 @@ modules: [
```

1. Use the option names and values as implemented in the cable namelist file.
2. Use the option names and values as implemented in the cable namelist file.
3. You need to setup your environment for meorg_client before using this feature.
2. You need to setup your environment for meorg_client before using this feature.
3. Use the option names and values as implemented in the cable namelist file.

The evaluation results will be on modelevaluation.org accessible from the Model Output page you've specified

Expand All @@ -110,13 +106,11 @@ realisations:
- repo:
git:
branch: main
model_output_name: True # (2)
- repo:
git:
branch: XXXXX

fluxsite:
meorg_model_output_id: ZZZZ # (1)

modules: [
intel-compiler/2021.1.1,
netcdf/4.7.4,
Expand All @@ -141,21 +135,21 @@ realisations:
- repo:
git:
branch: main
model_output_name: True # (2)
- repo:
name: my-feature-off # (2)
name: my-feature-off # (3)
local:
path: XXXXX # (3)
path: XXXXX # (4)
- repo:
name: my-feature-on
local:
path: XXXXX
patch: # (4)
patch: # (5)
cable:
cable_user:
new_feature: YYYY

fluxsite:
meorg_model_output_id: ZZZZ # (5)
pbs: # (6)
ncpus: 8
mem: 16GB
Expand All @@ -169,10 +163,10 @@ modules: [
```

1. Testing at one flux site only to save time and resources.
2. We are using the same branch twice so we need to name each occurrence differently.
3. Give the full path to your local CABLE repository with your code changes.
4. Use the option names and values as implemented in the cable namelist file.
5. You need to setup your environment for meorg_client before using this feature.
2. You need to setup your environment for meorg_client before using this feature.
3. We are using the same branch twice so we need to name each occurrence differently.
4. Give the full path to your local CABLE repository with your code changes.
5. Use the option names and values as implemented in the cable namelist file.
6. You can reduce the requested resources to reduce the cost of the test.

Comparisons of R0 and R1 should show bitwise agreement. R2 and R0 (and R1) comparison on modelevaluation.org shows the impact of the changes.
44 changes: 38 additions & 6 deletions src/benchcab/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@
from pathlib import Path

import yaml
import copy
from cerberus import Validator

import benchcab.utils as bu
from benchcab import internal
from benchcab.utils.repo import create_repo
from benchcab.model import Model


class ConfigValidationError(Exception):
Expand Down Expand Up @@ -82,7 +85,7 @@ def read_optional_key(config: dict):
Parameters
----------
config : dict
The configuration file with with/without optional keys
The configuration file with without optional keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither "with without optional keys" nor "with/without optional keys" make any sense. Any idea what we are trying to say here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's like the user may already have passed in the optional key (thus not replacing it), otherwise if the config file is without the necessary optional key, it would be replaced by a value calculated in read_optional_key.


"""
if "project" not in config:
Expand Down Expand Up @@ -119,12 +122,38 @@ def read_optional_key(config: dict):
config["fluxsite"]["pbs"] = internal.FLUXSITE_DEFAULT_PBS | config["fluxsite"].get(
"pbs", {}
)
config["fluxsite"]["meorg_model_output_id"] = config["fluxsite"].get(
"meorg_model_output_id", internal.FLUXSITE_DEFAULT_MEORG_MODEL_OUTPUT_ID
)

config["codecov"] = config.get("codecov", False)

return config


def add_model_output_name(config: dict):
"""Determine model output name from realisations.

Parameters
----------
config : dict
The configuration file with with optional keys

"""
# pure function
config = copy.deepcopy(config)

is_model_output_name = False
for r in config["realisations"]:
assert not is_model_output_name
if r.pop("model_output_name", None):
is_model_output_name = True
repo = create_repo(
spec=r["repo"],
path=internal.SRC_DIR / (r["name"] if r.get("name") else Path()),
)
config["model_output_name"] = Model(repo).name
break
assert is_model_output_name
return config


def read_config_file(config_path: str) -> dict:
"""Load the config file in a dict.
Expand Down Expand Up @@ -154,6 +183,8 @@ def read_config(config_path: str) -> dict:
----------
config_path : str
Path to the configuration file.
is_meorg: str
Whether workflow includes meorg job submission. If true, determine the model output name

Returns
-------
Expand All @@ -169,7 +200,8 @@ def read_config(config_path: str) -> dict:
# Read configuration file
config = read_config_file(config_path)
# Populate configuration dict with optional keys
read_optional_key(config)
# Validate and return.
config = read_optional_key(config)
# Validate.
validate_config(config)
config = add_model_output_name(config)
return config
10 changes: 4 additions & 6 deletions src/benchcab/data/config-schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ realisations:
path:
type: "string"
required: true
model_output_name:
nullable: true
type: "boolean"
required: false
name:
nullable: true
type: "string"
Expand Down Expand Up @@ -107,12 +111,6 @@ fluxsite:
schema:
type: "string"
required: false
meorg_model_output_id:
type:
- "boolean"
- "string"
required: false
default: false

spatial:
type: "dict"
Expand Down
36 changes: 29 additions & 7 deletions src/benchcab/data/meorg_jobscript.j2
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,48 @@ set -ev
# Set some things
DATA_DIR={{data_dir}}
NUM_THREADS={{num_threads}}
MODEL_OUTPUT_ID={{model_output_id}}
CACHE_DELAY={{cache_delay}}
MEORG_BIN={{meorg_bin}}
MODEL_PROFILE_ID={{ model_prof_id }}
MODEL_OUTPUT_NAME={{ mo.name }}
MODEL_OUTPUT_ARGS=()

{% if purge_outputs %}
# Purge existing model outputs
echo "Purging existing outputs from $MODEL_OUTPUT_ID"
$MEORG_BIN file detach_all $MODEL_OUTPUT_ID
# Create new model output entity
MODEL_OUTPUT_ARGS+="--state-selection {{ mo.state_selection }}"
MODEL_OUTPUT_ARGS+=" --parameter-selection {{ mo.parameter_selection }}"
{% if mo.is_bundle %}
MODEL_OUTPUT_ARGS+=" --is-bundle"
{% endif %}

MODEL_OUTPUT_ID=$($MEORG_BIN output query $MODEL_OUTPUT_NAME | head -n 1 )
if [ ! -z "${MODEL_OUTPUT_ID}" ] ; then
echo "Deleting existing files from model output ID"
$MEORG_BIN file delete_all $MODEL_OUTPUT_ID
echo "Updated model output ID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need this output here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a model output ID already exists with a given name, we want to preserve the model output ID (since in case we assume that the user is re-running the same experiment). However, we want to clean up any existing files, so that the experiment runs with the intended files of the user. (I have added a short comment on the same)

else
echo "Create new model output ID"
fi

MODEL_OUTPUT_ID="$($MEORG_BIN output create $MODEL_PROFILE_ID $MODEL_OUTPUT_NAME $MODEL_OUTPUT_ARGS | head -n 1 | awk '{print $NF}')"
echo "Add experiments to model output"
$MEORG_BIN experiment update $MODEL_OUTPUT_ID {{ model_exp_ids|join(',') }}

# Upload the data
echo "Uploading data to $MODEL_OUTPUT_ID"
$MEORG_BIN file upload $DATA_DIR/*.nc -n $NUM_THREADS --attach_to $MODEL_OUTPUT_ID
$MEORG_BIN file upload $DATA_DIR/*.nc -n $NUM_THREADS $MODEL_OUTPUT_ID

# Wait for the cache to transfer to the object store.
echo "Waiting for object store transfer ($CACHE_DELAY sec)"
sleep $CACHE_DELAY

{% for exp_id in model_exp_ids %}
echo "Replace benchmarks to model output"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "Add benchmarks ..." in the output would make more sense to me than "Replace"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment to "Add". The reason I chose "Replace" was, unlike adding experiments, if a benchmark already exists and we use meorg benchmark update, the existing benchmarks get overwritten (but yeah I don't see a use case for the user to not add benchmarks from scratch everytime in this workflow).

$MEORG_BIN benchmark update $MODEL_OUTPUT_ID {{ exp_id }} {{ model_benchmark_ids|join(',') }}

# Trigger the analysis
echo "Triggering analysis on $MODEL_OUTPUT_ID"
$MEORG_BIN analysis start $MODEL_OUTPUT_ID
$MEORG_BIN analysis start $MODEL_OUTPUT_ID {{ exp_id }}

{% endfor %}

echo "DONE"
1 change: 1 addition & 0 deletions src/benchcab/data/test/config-basic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ realisations:
- repo:
svn:
branch_path: trunk
model_output_name: True
- repo:
svn:
branch_path: branches/Users/ccc561/v3.0-YP-changes
Expand Down
2 changes: 1 addition & 1 deletion src/benchcab/data/test/config-optional.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ project: hh5

fluxsite:
experiment: AU-Tum
meorg_model_output_id: False
multiprocess: False
pbs:
ncpus: 6
Expand Down Expand Up @@ -31,6 +30,7 @@ realisations:
svn:
branch_path: trunk
name: svn_trunk
model_output_name: True
- repo:
svn:
branch_path: branches/Users/ccc561/v3.0-YP-changes
Expand Down
2 changes: 1 addition & 1 deletion src/benchcab/data/test/integration_meorg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ realisations:
- repo:
local:
path: $CABLE_DIR
model_output_name: true
- repo:
git:
branch: main
Expand All @@ -47,7 +48,6 @@ fluxsite:
- scratch/$PROJECT
- gdata/$PROJECT
# This ID is currently configured on the me.org server.
meorg_model_output_id: Sss7qupAHEZ8ovbCv
EOL

benchcab run -v
Loading
Loading