Skip to content
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
8276e71
Bug fixes for the missing ensemble.size error.
Oct 3, 2025
05f82da
Implement joint input sampling in the SDA workflows.
Oct 3, 2025
884a69d
Update the joint inputs sampling function.
Oct 3, 2025
f7f7736
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ Oct 5, 2025
447b08d
Update the ensemble configs function.
Oct 5, 2025
92cd3f1
Clean the multisite sda function.
Oct 5, 2025
f2a96d1
Revert the changes.
Oct 5, 2025
40701fe
Revert changes.
Oct 5, 2025
4893c11
revert changes.
Oct 5, 2025
f6d473a
Revert changes.
Oct 5, 2025
f787a17
Add documentation and bug fixes.
Oct 5, 2025
c82acac
Update document.
Oct 5, 2025
f6f2232
Update the function to work with the newest functions.
Oct 5, 2025
a30c9e8
Remove the samples argument from the function.
Oct 5, 2025
74d5a1e
Add changes to the sda workflow.
Oct 6, 2025
70cfcb1
Add namespace.
Oct 7, 2025
1f87740
Bug fix.
Oct 7, 2025
8b0a640
Apply Mike's comments.
Oct 8, 2025
fe6bdca
Add the merge nc feature.
Oct 8, 2025
660186b
Improve scripts.
Oct 8, 2025
dd0ee2e
Bug fix.
Oct 8, 2025
5ddcaa7
Update the parallel registration.
Oct 15, 2025
ea8332a
Fix the dimension error when YN equals one.
Oct 15, 2025
5cb5a38
Update the parallel registration. And add the else control when it's …
Oct 15, 2025
fdacd5f
Convert from SoilMoist to SoilMoistFrac.
Oct 15, 2025
abecd42
Add the recursive file deletion.
Oct 15, 2025
fa18299
Revert changes to resolve conflicts.
Oct 15, 2025
81f035b
revert change.
Oct 15, 2025
3fc6907
Pull from pecan develop.
Oct 15, 2025
6fba821
change by make.
Oct 15, 2025
02ca610
Combine previous changes with the debias implementation.
Oct 15, 2025
3575463
Merge the debiasing feature into the parallel SDA workflow.
Oct 15, 2025
bebf893
Update document.
Oct 15, 2025
1409564
Update document.
Oct 15, 2025
870593f
Add the debiasing feature to the parallel job submission.
Oct 15, 2025
10310bc
map forecast results to the variable boundary.
Oct 15, 2025
300a900
Update comments.
Oct 15, 2025
408a390
update observation paths.
Oct 15, 2025
5b3cc63
Update computation configuration.
Oct 15, 2025
393c16e
Bug fix.
Oct 15, 2025
4ac6250
Bug fix.
Oct 16, 2025
50c2324
Revert change.
Oct 16, 2025
993e98b
Revert change.
Oct 16, 2025
78e4702
Add debias code.
Oct 16, 2025
d6d06f3
Move the previous debias script to the inst folder.
Oct 18, 2025
460e0ee
remove previous debias functions.
Oct 18, 2025
6025d83
Add the main SDA debias workflow.
Oct 18, 2025
85c59fe
Add the new debias workflow to the multi-site version.
Oct 18, 2025
ec38802
Add the new bias correction workflow to the sda_local script.
Oct 18, 2025
f1d10a1
Add the new SDA debiasing workflow to the SDA job submission function.
Oct 18, 2025
77bda48
Move out of the if control.
Oct 19, 2025
9c3fd8e
Bug fix.
Oct 19, 2025
e6ef100
Bug fix
Oct 19, 2025
087f18a
Remove the condition for the wrapping where we have more than 20 char…
Oct 19, 2025
5a04563
Update the runner script.
Oct 19, 2025
7ba598e
remove arg.
Oct 20, 2025
19db026
Update script.
Oct 20, 2025
2f14acc
Revert "Move out of the if control."
Oct 20, 2025
caefddf
Resolve conflicts.
Oct 20, 2025
6da0cd6
Revert back.
Oct 20, 2025
cb2379a
Revert to using the new debias workflow.
Oct 21, 2025
668c4b1
Update document.
Oct 21, 2025
e543c23
Update document.
Oct 21, 2025
a44399d
Update the runner script.
Oct 21, 2025
4701b41
Update the bias correction function.
Oct 21, 2025
2754b91
Update the usage of the bias correction workflow.
Oct 21, 2025
459c945
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ Oct 21, 2025
0e08b79
Update observation paths.
Oct 21, 2025
af5c081
Update namespace.
Oct 21, 2025
300dfae
Merge branch 'SDA_data' of https://github.com/DongchenZ/pecan into SD…
Oct 21, 2025
c8a8206
Modify the test to match the change in the logger.message function.
Oct 21, 2025
441f732
Change the default value of wrap in the logger message function.
Oct 21, 2025
9099aa0
Revert back.
Oct 21, 2025
986930b
Merge branch 'PecanProject:develop' into SDA_data
DongchenZ Oct 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
385 changes: 149 additions & 236 deletions modules/assim.sequential/R/sda.enkf_MultiSite.R

Large diffs are not rendered by default.

55 changes: 38 additions & 17 deletions modules/assim.sequential/R/sda.enkf_parallel.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' @param obs.cov Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation covariances for all state variables of each site for each time point.
#' @param Q Process covariance matrix given if there is no data to estimate it.
#' @param pre_enkf_params Used for passing pre-existing time-series of process error into the current SDA runs to ignore the impact by the differences between process errors.
#' @param ensemble.samples Pass ensemble.samples from outside to avoid GitHub check issues.
#' @param ensemble.samples list of ensemble parameters across PFTs. Default is NULL.
#' @param outdir physical path to the folder that stores the SDA outputs. Default is NULL.
#' @param control List of flags controlling the behavior of the SDA.
#' `TimeseriesPlot` for post analysis examination;
Expand Down Expand Up @@ -218,6 +218,10 @@ sda.enkf_local <- function(settings,
parent_ids=NULL)
}
}
# get the joint input design.
input_design <- generate_joint_ensemble_design(settings = settings[[1]],
ensemble_samples = ensemble.samples,
ensemble_size = nens)[[1]]
###------------------------------------------------------------------------------------------------###
### loop over time ###
###------------------------------------------------------------------------------------------------###
Expand All @@ -226,7 +230,7 @@ sda.enkf_local <- function(settings,
sda.outputs <- FORECAST <- enkf.params <- ANALYSIS <- ens_weights <- list()
obs.t <- as.character(lubridate::date(obs.times[t]))
obs.year <- lubridate::year(obs.t)
PEcAn.logger::logger.info(paste("Processing Year:", obs.year))
PEcAn.logger::logger.info(paste("Processing date:", obs.t))
###-------------------------------------------------------------------------###
### Taking care of Forecast. Splitting / Writting / running / reading back###
###-------------------------------------------------------------------------###-----
Expand Down Expand Up @@ -286,22 +290,39 @@ sda.enkf_local <- function(settings,
# release memory.
gc()
# submit jobs for writing configs.
# writing configs for each settings
PEcAn.logger::logger.info("Writting configs!")
out.configs <- furrr::future_pmap(list(conf.settings %>% `class<-`(c("list")), restart.list, inputs), function(settings, restart.arg, inputs) {
# Loading the model package - this is required bc of the furrr
library(paste0("PEcAn.",settings$model$type), character.only = TRUE)
# wrtting configs for each settings - this does not make a difference with the old code
PEcAn.uncertainty::write.ensemble.configs(
defaults = settings$pfts,
ensemble.samples = ensemble.samples,
settings = settings,
model = settings$model$type,
write.to.db = settings$database$bety$write,
restart = restart.arg,
samples=inputs,
rename = F
)
}) %>% stats::setNames(site.ids)
# here we use the foreach instead of furrr
# because for some reason, the furrr has problem returning the sample paths.
cl <- parallel::makeCluster(parallel::detectCores())
doSNOW::registerDoSNOW(cl)
out.configs <- foreach::foreach(temp.settings = as.list(conf.settings),
restart.arg = restart.list,
.packages = c("Kendall",
"purrr",
"PEcAn.uncertainty",
paste0("PEcAn.", model),
"PEcAnAssimSequential")) %dopar% {
temp <- PEcAn.uncertainty::write.ensemble.configs(
input_design = input_design,
ensemble.size = nens,
defaults = temp.settings$pfts,
ensemble.samples = ensemble.samples,
settings = temp.settings,
model = temp.settings$model$type,
write.to.db = temp.settings$database$bety$write,
restart = restart.arg,
# samples=inputs,
rename = TRUE
)
return(temp)
} %>% stats::setNames(site.ids)
parallel::stopCluster(cl)
foreach::registerDoSEQ()
# update the file paths of different inputs when t = 1.
if (t == 1) {
inputs <- out.configs %>% purrr::map(~.x$samples)
}
# collect run info.
# get ensemble ids for each site.
ensemble.ids <- site.ids %>% furrr::future_map(function(i){
Expand Down
15 changes: 6 additions & 9 deletions modules/assim.sequential/man/sda.enkf.multisite.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/assim.sequential/man/sda.enkf_local.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 15 additions & 18 deletions modules/uncertainty/R/ensemble.R
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,6 @@ get.ensemble.samples <- function( ensemble.size, pft.samples, env.samples,
##' @param clean remove old output first?
##' @param write.to.db logical: Record this run in BETY?
##' @param restart In case this is a continuation of an old simulation. restart needs to be a list with name tags of runid, inputs, new.params (parameters), new.state (initial condition), ensemble.id (ensemble id), start.time and stop.time.See Details.
##' @param samples Sampled inputs such as met and parameter files
##' @param rename Decide if we want to rename previous output files, for example convert from sipnet.out to sipnet.2020-07-16.out.
##'
##' @return list, containing $runs = data frame of runids, $ensemble.id = the ensemble ID for these runs and $samples with ids and samples used for each tag. Also writes sensitivity analysis configuration files as a side effect
Expand All @@ -226,7 +225,7 @@ get.ensemble.samples <- function( ensemble.size, pft.samples, env.samples,
##' @export
##' @author David LeBauer, Carl Davidson, Hamze Dokoohaki
write.ensemble.configs <- function(input_design , ensemble.size, defaults, ensemble.samples, settings, model,
clean = FALSE, write.to.db = TRUE, restart = NULL, samples = NULL, rename = FALSE) {
clean = FALSE, write.to.db = TRUE, restart = NULL, rename = FALSE) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the samples argument as it is no longer needed.



# Check if there are NO inputs
Expand Down Expand Up @@ -320,22 +319,18 @@ for (input_tag in names(settings$run$inputs)) {
}
#now looking into the xml
samp <- settings$ensemble$samplingspace
if(is.null(samples)){
#performing the sampling
samples <- list()
input_tags <- names(settings$run$inputs)

for (input_tag in input_tags) {
if (input_tag %in% colnames(input_design)) {
input_paths <- settings$run$inputs[[input_tag]]$path
input_indices <- input_design[[input_tag]]

samples[[input_tag]] <- list(
samples = lapply(input_indices, function(idx) input_paths[[idx]])
)
}

}
#performing the sampling
samples <- list()
input_tags <- names(settings$run$inputs)
for (input_tag in input_tags) {
if (input_tag %in% colnames(input_design)) {
input_paths <- settings$run$inputs[[input_tag]]$path
input_indices <- input_design[[input_tag]]

samples[[input_tag]] <- list(
samples = lapply(input_indices, function(idx) input_paths[[idx]])
)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the samples argument as it is no longer needed.

}
# if there is a tag required by the model but it is not specified in the xml then I replicate n times the first element
required_tags%>%
Expand Down Expand Up @@ -515,6 +510,8 @@ for (input_tag in names(settings$run$inputs)) {
for (i in seq_len(ensemble.size)) {
input_list <- list()
for (input_tag in names(inputs)) {
# if it's the parameter list, skip.
if (input_tag == "parameters") next
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we are not sending parameters for the paths, as the new inputs_design does include the parameter field.

if (!is.null(inputs[[input_tag]]$samples[[i]]))
input_list[[input_tag]] <- list(path = inputs[[input_tag]]$samples[[i]])
}
Expand Down
50 changes: 27 additions & 23 deletions modules/uncertainty/R/generate_joint_ensemble_design.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,20 @@
#' are shared across sites to ensure consistent parameter sampling.
#'
#' @param settings A PEcAn settings object containing ensemble configuration
#' @param sobol for activating sobol
#' @param ensemble_size Integer specifying the number of ensemble members
#' @param ensemble_samples list of ensemble parameters across PFTs. The default is NULL.
#' @param sobol for activating sobol
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the ensemble_samples argument to ensure we can also pass the parameters directly.

#' @return A list containing ensemble samples and indices
#'
#' @export

generate_joint_ensemble_design <- function(settings,
ensemble_size,
ensemble_samples = NULL,
sobol = FALSE) {
if (sobol) {
ensemble_size <- as.numeric(ensemble_size) * 2
}

ens.sample.method <- settings$ensemble$samplingspace$parameters$method
design_list <- list()
sampled_inputs <- list()
Expand Down Expand Up @@ -51,29 +52,34 @@ generate_joint_ensemble_design <- function(settings,
sampled_inputs[[input_tag]] <- input_result$ids
design_list[[input_tag]] <- input_result$ids
}

# Sample parameters
PEcAn.uncertainty::get.parameter.samples(
settings,
ensemble.size = ensemble_size,
posterior.files,
ens.sample.method
)


# Sample parameters if we don't have it.
if (is.null(ensemble_samples)) {
PEcAn.uncertainty::get.parameter.samples(
settings,
ensemble.size = ensemble_size,
posterior.files,
ens.sample.method
)
samples.file <- file.path(settings$outdir, "samples.Rdata")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add if-control for the case where we don't assign the ensemble_samples argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the argument for when it makes sense to assume that we have ensemble_samples already? In other words, in what cases would you want/need to recycle a previous ensemble_samples rather than generate a new sample from the posteriors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Load samples from file
samples.file <- file.path(settings$outdir, "samples.Rdata")
samples <- new.env()
if (file.exists(samples.file)) {
load(samples.file, envir = samples)
if (!is.null(samples$ensemble.samples)) {
# Just a placeholder: extract representative trait index per ensemble member
# You may want to flatten or select indices per trait
design_list[["param"]] <- seq_len(ensemble_size)
# if we don't have the parameters from the outside.
if (is.null(ensemble_samples)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have to consecutive if(is.null(ensemble_samples) blocks? why can't these be combined into one block?

if (ile.exists(samples.file)) {
load(samples.file, envir = samples)
} else {
PEcAn.logger::logger.warn("ensemble.samples not found in samples.Rdata")
PEcAn.logger::logger.error(samples.file, "not found, this file is required")
}
}
if (!is.null(samples$ensemble.samples) | !is.null(ensemble_samples)) {
# Just a placeholder: extract representative trait index per ensemble member
# You may want to flatten or select indices per trait
design_list[["param"]] <- seq_len(ensemble_size)
} else {
PEcAn.logger::logger.error(samples.file, "not found, this file is required")
PEcAn.logger::logger.warn("ensemble.samples not found in samples.Rdata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop this from error to warn?

}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add if-control for the case where we don't assign the ensemble_samples argument.

design_matrix <- data.frame(design_list)

Expand All @@ -85,7 +91,5 @@ generate_joint_ensemble_design <- function(settings,
return(sobol_obj)
}



return(list(X = design_matrix))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that this be a list? All the uses I've seen so far immediately subset the result to only store the X component, so would it make sense to return design_matrix unwrapped in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important so that the sobol version works correctly. This ensures that regardless of whether the sobol or non-sobol version is called that the output is a list that includes the design as X. In the sobol version the list includes additional info beyond just X that's required by the function that does the sobol index calculations, but not required to do the runs themselves

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Can you add this to the function docs, please? I think it'd be enough to list in the @return block the possible items for the Sobol case and mention that in the non-sobol case the list only contains X

}
}
9 changes: 8 additions & 1 deletion modules/uncertainty/man/generate_joint_ensemble_design.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 0 additions & 3 deletions modules/uncertainty/man/write.ensemble.configs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading