Partial fixes for multisite ensembles #3654

infotroph · 2025-10-20T17:57:49Z

Description

A couple small fixes and a questionable hack for running ensemble and uncertainty analyses without database access.

Make sure SA run directories contain site IDs, so that e.g. SA-median/ isn't overwritten for each site in turn
Only try to run SA on the PFTs actually present at a given site
If Bety isn't available and no ensemble ID is present in the settings object, assign a new one by hashing the settings object.

Note that the latter step is done independently for each call to write.*.configs, so in a multisite run this will effectively set up a separate ensemble/SA for each site. This was what I wanted today, but I suspect most people will want outputs aggregated across sites, which this PR does not implement.

Motivation and Context

For the MAGiC project I wanted to quickly evaluate AGB timeseries from many sites, for which the timeseries plots from the ensemble analysis would be perfect except that I'm running with no Bety access and the existing code sets the ensemble ID to NOENSEMBLEID, making each site overwrite the outputs from the previous one.

Since the issue applies to both ensemble and sensitivity I tried to implement a fix for both, but note that I focused on avoiding collisions between distinct ensembles -- there are still places where two sites with the same ensemble ID will overwrite each other.

I'm pasting my wokring notes below -- @divine7022 and @dlebauer will likely want to consider the unresolved issues in their work on multisite sensitivity.

write.configs fails if SA is requested in a settings with ensemble size > 1
- Workaround: run SA and full ensemble in separate settings files
  => unresolved
(minor): README.txt does not specify which met/IC/soil/event inputs were used
=> unresolved
rundir SA-<pft>-<var>-<quantile> contents are overwritten by each site in turn
=> Resolved by adding site id to the get.run.id call
rundir SA-median- tries to run analysis for "ALL PFT", fails on NAs from pfts not present at that site
=> Resolved by having run.sensitivity.analysis subset PFTs to those in run$site$site.pft. PFT doesn't show up in rundir names, but since only one per site it works.
each site's call to run.write.configs overwrites sensitivity.samples
- Since run.write.configs only sees one site at a time, need to choose one of:
  - write separate samples file for each site, combine later
  - append samples to existing samples file
  - move entire SA sample generation to a step not wrapped in papply
  - ?stop saving sensitivity.samples if not strictly needed after write.sa.configs is finished
    (But I think this is where run IDs are taken from)
    => Unresolved
runModule.run.sensitivity.analysis overwrites outputs as it runs for each site
- Affects ensemble analysis too
- The correct fix for this will probably parallel the fix for run.write.configs
- Hacky workaround: Pass each site a different ensemble id to get n_sites separate outputs, then manually combine posthoc
  => This workaround implemented by setting null ensemble.ids to rlang::hash(settings),
  but if we might consider settings$run$site$id instead. Are there cases where a multiSettings might contain multiple entries from the same site? or where site ID would be unset?

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
My name is in the list of CITATION.cff
I agree that PEcAn Project may distribute my contribution under any or all of
- the same license as the existing code,
- and/or the BSD 3-clause license.
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

CHANGELOG.md

infotroph added 5 commits October 17, 2025 14:37

wording and whitespace

cc52beb

use settings hash as ensemble id if not provided

9ccf904

filter to pfts present at this site

91cc792

pass site id when naming rundirs

c56c125

changelog

ef6d42d

github-actions bot added Modules Base labels Oct 20, 2025

infotroph commented Oct 20, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

159284a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partial fixes for multisite ensembles #3654

Partial fixes for multisite ensembles #3654

infotroph commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Partial fixes for multisite ensembles #3654

Are you sure you want to change the base?

Partial fixes for multisite ensembles #3654

Conversation

infotroph commented Oct 20, 2025

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant