diff --git a/.Rbuildignore b/.Rbuildignore
index cda1635..920b614 100644
--- a/.Rbuildignore
+++ b/.Rbuildignore
@@ -11,4 +11,9 @@
^RESEARCH-NOTICE\.md$
^vignettes/images
^vignettes/motorcycle.Rmd$
+^vignettes/classification.Rmd$
+^vignettes/large_scale_emulation.Rmd$
+^vignettes/linked_DGP.Rmd$
+^vignettes/seq_design.Rmd$
+^vignettes/seq_design_2.Rmd$
^LICENSE\.md$
diff --git a/NAMESPACE b/NAMESPACE
index b051357..6c5b51d 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -29,9 +29,6 @@ S3method(validate,lgp)
S3method(vigf,bundle)
S3method(vigf,dgp)
S3method(vigf,gp)
-export(Hetero)
-export(NegBin)
-export(Poisson)
export(alm)
export(combine)
export(continue)
@@ -42,7 +39,6 @@ export(draw)
export(get_thread_num)
export(gp)
export(init_py)
-export(kernel)
export(lgp)
export(mice)
export(nllik)
diff --git a/NEWS.md b/NEWS.md
index f252ac4..700633e 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -15,8 +15,8 @@
- The `plot()` function has been updated to generate validation plots for DGP classifiers (i.e., DGP emulators with categorical likelihoods) and linked emulators created by `lgp()` using the new data frame form for `struc`.
- The `summary()` function has been redesigned to provide both summary tables and visualizations of structure and model specifications for (D)GP and linked (D)GP emulators.
- A `sample_size` argument has been added to the `validate()` and `plot()` functions, allowing users to adjust the number of samples used for validation when the validation method is set to `sampling`.
-- The following functions are deprecated as of this version and will be removed in the next release: `combine()`, `set_linked_idx()`, `kernel()`, `Poisson()`, `Hetero()`, and `NegBin()`. These functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
-- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been deprecated as of this version and will be removed in the next release. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
+- `combine()` and `set_linked_idx()` are deprecated as of this version and will be removed in the next release. These two functions are no longer maintained. Please refer to the updated package documentation for alternative workflows.
+- The basic node functions `kernel()`, `Hetero()`, `Poisson()`, and `NegBin()`, along with the `struc` argument in the `gp()` and `dgp()` functions, have been removed as of this version. Customization of (D)GP specifications can be achieved by modifying the other arguments in `gp()` and `dgp()`.
- The `draw()` function has been updated for instances of the `bundle` class to allow drawing of design and evaluation plots of all emulators in a single figure.
- The `plot()` function has been updated for linked emulators generated by `lgp()` using the new data frame form for `struc`.
- The `design()` function has been redesigned to allow new specifications of the user-supplied `method` function.
@@ -28,6 +28,8 @@
- The `write()` function now allows `light = TRUE` for both GP emulators and bundles of GP emulators.
- Two new functions, `serialize()` and `deserialize()`, have been added to allow users to export emulators to multi-session workers for parallel processing.
- Additional vignettes are available, showcasing large-scale DGP emulation and DGP classification.
+- Enhanced clarity and consistency across the documentation.
+- Improved examples and explanations in vignettes for better user guidance.
# dgpsi 2.4.0
- One can now use `design()` to implement sequential designs using `f` and a fixed candidate set passed to `x_cand` with `y_cand = NULL`.
diff --git a/R/alm.R b/R/alm.R
index e6aef4d..3f3878f 100644
--- a/R/alm.R
+++ b/R/alm.R
@@ -7,10 +7,10 @@
#' * the S3 class `gp`.
#' * the S3 class `dgp`.
#' * the S3 class `bundle`.
-#' @param x_cand a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
-#' from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` could also
-#' be a list with length equal to the number of emulators contained in `object`. In this case, each slot in `x_cand` should be a candidate set matrix
-#' for each emulator included in the bundle. Defaults to `NULL`.
+#' @param x_cand a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
+#' from which the next design point(s) are determined. If `object` is an instance of the `bundle` class and `aggregate` is not supplied, `x_cand` can also be a list.
+#' The list must have a length equal to the number of emulators in `object`, with each element being a matrix representing the candidate set for a corresponding
+#' emulator in the bundle. Defaults to `NULL`.
#' @param n_start an integer that gives the number of initial design points to be used to determine next design point(s). This argument
#' is only used when `x_cand` is `NULL`. Defaults to `20`.
#' @param batch_size an integer that gives the number of design points to be chosen. Defaults to `1`.
@@ -33,37 +33,40 @@
#' of the matrix is equal to:
#' - the emulator output dimension if `object` is an instance of the `dgp` class; or
#' - the number of emulators contained in `object` if `object` is an instance of the `bundle` class.
-#' * the output should be a vector that aggregates scores across outputs or emulators at different design points.
+#' * the output should be a vector that gives aggregate scores at different design points.
#'
-#' Set to `NULL` to disable the aggregation. Defaults to `NULL`.
+#' Set to `NULL` to disable aggregation. Defaults to `NULL`.
#' @param ... any arguments (with names different from those of arguments used in [alm()]) that are used by `aggregate`
#' can be passed here.
#'
#' @return
-#' 1. If `x_cand` is not `NULL` and:
-#' - `object` is an instance of the `gp` class, a vector is returned with length equal to `batch_size`, giving the positions (i.e., row numbers)
-#' of next design points from `x_cand`.
-#' - `object` is an instance of the `dgp` class, a vector is returned with length equal to `batch_size * D`, giving positions (i.e., row numbers)
-#' of next design points from `x_cand` to be added to the DGP emulator. `D` equals to the number of output dimensions of the DGP
-#' emulator if there is no likelihood layer in the hierarchy. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer,
-#' `D = 2`. If `object` is a DGP emulator with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
-#' - `object` is an instance of the `bundle` class, a matrix is returned with row number equal to `batch_size` and column number equal to the number of
-#' emulators in the bundle, giving positions (i.e., row numbers) of next design points from `x_cand` to be added to individual emulators.
-#' 2. If `x_cand = NULL` and:
-#' - `object` is an instance of the `gp` class, a matrix is returned with row number equal to `batch_size`, giving the next design points to be evaluated.
-#' - `object` is an instance of the `dgp` class, a matrix is returned with row number equal to `batch_size * D` where `D` is the number of output dimensions of the DGP
-#' emulator if no likelihood layer is included. If `object` is a DGP emulator with either `Hetero` or `NegBin` likelihood layer, `D = 2`. If `object` is a DGP emulator
-#' with a `Categorical` likelihood layer, `D` equals to one (for binary output) or `K` (for multi-class output with `K` classes).
-#' - `object` is an instance of the `bundle` class, a list is returned with the length equal to the number of
-#' emulators in the bundle. Each element in the list is a matrix with row number equal to `batch_size`, giving next design points to be added to individual emulators.
+#' 1. If `x_cand` is not `NULL`:
+#' - When `object` is an instance of the `gp` class, a vector of length `batch_size` is returned, containing the positions
+#' (row numbers) of the next design points from `x_cand`.
+#' - When `object` is an instance of the `dgp` class, a vector of length `batch_size * D` is returned, containing the positions
+#' (row numbers) of the next design points from `x_cand` to be added to the DGP emulator.
+#' * `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+#' * For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
+#' * For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
+#' - When `object` is an instance of the `bundle` class, a matrix is returned with `batch_size` rows and a column for each emulator in
+#' the bundle, containing the positions (row numbers) of the next design points from `x_cand` for individual emulators.
+#' 2. If `x_cand` is `NULL`:
+#' - When `object` is an instance of the `gp` class, a matrix with `batch_size` rows is returned, giving the next design points to be evaluated.
+#' - When `object` is an instance of the `dgp` class, a matrix with `batch_size * D` rows is returned, where:
+#' - `D` is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+#' - For a DGP emulator with a `Hetero` or `NegBin` likelihood layer, `D = 2`.
+#' - For a DGP emulator with a `Categorical` likelihood layer, `D = 1` for binary output or `D = K` for multi-class output with `K` classes.
+#' - When `object` is an instance of the `bundle` class, a list is returned with a length equal to the number of emulators in the bundle. Each
+#' element of the list is a matrix with `batch_size` rows, where each row represents a design point to be added to the corresponding emulator.
#'
#' @note
-#' The column order of the first argument of `aggregate` must be consistent with the order of emulator output dimensions (if `object` is an instance of the
-#' `dgp` class), or the order of emulators placed in `object` if `object` is an instance of the `bundle` class.
+#' The first column of the matrix supplied to the first argument of `aggregate` must correspond to the first output dimension of the DGP emulator
+#' if `object` is an instance of the `dgp` class, and so on for subsequent columns and dimensions. If `object` is an instance of the `bundle` class,
+#' the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
#' @references
#' MacKay, D. J. (1992). Information-based objective functions for active data selection. *Neural Computation*, **4(4)**, 590-604.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/design.R b/R/design.R
index 2ad2e2f..847a82b 100644
--- a/R/design.R
+++ b/R/design.R
@@ -51,9 +51,9 @@
#' * if `object` is an instance of the `bundle` class, `y_test` is a matrix with each row representing the outputs for the corresponding row of `x_test` and each column representing the output of the different emulators in the bundle.
#'
#' Set to `NULL` for LOO-based emulator validation. Defaults to `NULL`. This argument is only used if `eval = NULL`.
-#' @param reset A boolean or a vector of booleans indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
+#' @param reset A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
#' The re-fitting occurs based on the frequency specified by `freq[1]`. This option is useful when hyperparameters are suspected to have converged to a local optimum affecting validation performance.
-#' - If a single boolean is provided, it applies to every iteration of the sequential design.
+#' - If a single bool is provided, it applies to every iteration of the sequential design.
#' - If a vector is provided, its length must equal `N` (even if the re-fit frequency specified in `freq[1]` is not 1) and it will apply to the corresponding iterations of the sequential design.
#'
#' Defaults to `FALSE`.
@@ -91,18 +91,18 @@
#'
#' If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used.
#' Defaults to `NULL`. See the *Note* section below for additional details.
-#' @param verb a boolean indicating if trace information will be printed during the sequential design.
+#' @param verb a bool indicating if trace information will be printed during the sequential design.
#' Defaults to `TRUE`.
#' @param autosave a list that contains configuration settings for the automatic saving of the emulator:
-#' * `switch`: a boolean indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
+#' * `switch`: a bool indicating whether to enable automatic saving of the emulator during sequential design. When set to `TRUE`,
#' the emulator in the final iteration is always saved. Defaults to `FALSE`.
#' * `directory`: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory
#' of `directory` named 'emulator-`id`'. Defaults to './check_points'.
#' * `fname`: a string representing the base name for the saved emulator files. Defaults to 'check_point'.
#' * `save_freq`: an integer indicating the frequency of automatic saves, measured in the number of iterations. Defaults to `5`.
-#' * `overwrite`: a boolean value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
+#' * `overwrite`: a bool value controlling the file saving behavior. When set to `TRUE`, each new automatic save overwrites the previous one,
#' keeping only the latest version. If `FALSE`, each automatic save creates a new file, preserving all previous versions. Defaults to `FALSE`.
-#' @param new_wave a boolean indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
+#' @param new_wave a bool indicating whether the current call to [design()] will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
#' This argument is relevant only if waves already exist in the emulator. Creating new waves can improve the visualization of sequential design performance across different calls
#' to [design()] via [draw()], and allows for specifying a different evaluation frequency in `freq`. However, disabling this option can help limit the number of waves visualized
#' in [draw()] to avoid issues such as running out of distinct colors for large numbers of waves. Defaults to `TRUE`.
@@ -123,9 +123,9 @@
#' if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set to `max physical cores available %/% 2`.
#' Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components
#' is computationally expensive. Defaults to `1`.
-#' @param pruning a boolean indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
+#' @param pruning a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
#' design points exceeds `min_size` in `control`. The argument is only applicable to DGP emulators (i.e., `object` is an instance of `dgp` class)
-#' produced by `dgp()` with `struc = NULL`. Defaults to `TRUE`.
+#' produced by `dgp()`. Defaults to `TRUE`.
#' @param control a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
#' * `min_size`, the minimum number of design points required to trigger dynamic pruning. Defaults to 10 times the number of input dimensions.
#' * `threshold`, the \eqn{R^2} value above which a GP node is considered redundant. Defaults to `0.97`.
@@ -156,8 +156,8 @@
#' If `target` is not `NULL`, the following additional elements are also included:
#' - `target`: the target evaluating metric computed by the `eval` or built-in function to stop the sequential design.
#' - `reached`: indicates whether the `target` was reached at the end of the sequential design:
-#' - a boolean if `object` is an instance of the `gp` or `dgp` class.
-#' - a vector of booleans if `object` is an instance of the `bundle` class, with its length determined as follows:
+#' - a bool if `object` is an instance of the `gp` or `dgp` class.
+#' - a vector of bools if `object` is an instance of the `bundle` class, with its length determined as follows:
#' - equal to the number of emulators in the bundle when `eval = NULL`.
#' - equal to the length of the output from `eval` when a custom `eval` function is provided.
#' - a slot called `type` that gives the type of validation:
@@ -201,7 +201,7 @@
#' within `f` are handled by appropriately returning `NA`s.
#' * When defining `eval`, the output metric needs to be positive if [draw()] is used with `log = T`. And one needs to ensure that a lower metric value indicates
#' a better emulation performance if `target` is set.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#'
#' @examples
#' \dontrun{
@@ -3237,10 +3237,6 @@ check_reset <- function(reset, N){
check_auto <- function(object){
auto_pruning <- T
# exclude user-defined structure
- if (!"internal_dims" %in% names(object[['specs']])) {
- auto_pruning <- F
- return(auto_pruning)
- } else {
n_layer <- object$constructor_obj$n_layer
if (object$constructor_obj$all_layer[[n_layer]][[1]]$type!='gp') {
n_layer <- n_layer - 1
@@ -3257,7 +3253,7 @@ check_auto <- function(object){
}
}
}
- }
+
return(auto_pruning)
}
@@ -3342,24 +3338,24 @@ reverse_minmax <- function(normalized_data, limits) {
return(original_data)
}
-generic_wrapper <- function(r_func) {
- function(...) {
- # Capture the arguments
- args <- list(...)
-
- # Convert Python-native arguments to R-native if necessary
- args <- lapply(args, function(arg) {
- if (inherits(arg, "python.builtin.object")) {
- reticulate::py_to_r(arg)
- } else {
- arg
- }
- })
-
- # Call the user-provided R function with converted arguments
- result <- do.call(r_func, args)
-
- # Convert the result back to Python-native types
- reticulate::r_to_py(result)
- }
-}
+#generic_wrapper <- function(r_func) {
+# function(...) {
+# # Capture the arguments
+# args <- list(...)
+#
+# # Convert Python-native arguments to R-native if necessary
+# args <- lapply(args, function(arg) {
+# if (inherits(arg, "python.builtin.object")) {
+# reticulate::py_to_r(arg)
+# } else {
+# arg
+# }
+# })
+#
+# # Call the user-provided R function with converted arguments
+# result <- do.call(r_func, args)
+#
+# # Convert the result back to Python-native types
+# reticulate::r_to_py(result)
+# }
+#}
diff --git a/R/dgp.R b/R/dgp.R
index a72d2aa..bca3321 100644
--- a/R/dgp.R
+++ b/R/dgp.R
@@ -5,44 +5,35 @@
#' @param X a matrix where each row is an input training data point and each column represents an input dimension.
#' @param Y a matrix containing observed training output data. The matrix has its rows being output data points and columns representing
#' output dimensions. When `likelihood` (see below) is not `NULL`, `Y` must be a matrix with a single column.
-#' @param struc `r lifecycle::badge("deprecated")` a list that specifies a user-defined DGP structure. It should contain *L* (the number of DGP layers) sub-lists,
-#' each of which represents a layer and contains a number of GP nodes (defined by [kernel()]) in the corresponding layer.
-#' The final layer of the DGP structure (i.e., the final sub-list in `struc`) can be a likelihood
-#' layer that contains a likelihood function (e.g., [Poisson()]). When `struc = NULL`,
-#' the DGP structure is automatically generated and can be checked by applying [summary()] to the output from [dgp()] with `training = FALSE`.
-#' If this argument is used (i.e., user provides a customized DGP structure), arguments `depth`, `node`, `name`, `lengthscale`, `bounds`, `prior`,
-#' `share`, `nugget_est`, `nugget`, `scale_est`, `scale`, `connect`, `likelihood`, and `internal_input_idx` will NOT be used. Defaults to `NULL`.
-#'
-#' **The argument will be removed in the next release. To customize DGP specifications, please adjust the other arguments in the [dgp()] function.**
#' @param depth number of layers (including the likelihood layer) for a DGP structure. `depth` must be at least `2`.
-#' Defaults to `2`.
+#' Defaults to `2`.
#' @param node number of GP nodes in each layer (except for the final layer or the layer feeding the likelihood node) of the DGP. Defaults to
-#' `ncol(X)`.
+#' `ncol(X)`.
#' @param name a character or a vector of characters that indicates the kernel functions (either `"sexp"` for squared exponential kernel or
#' `"matern2.5"` for Matérn-2.5 kernel) used in the DGP emulator:
#' 1. if a single character is supplied, the corresponding kernel function will be used for all GP nodes in the DGP hierarchy.
#' 2. if a vector of characters is supplied, each character of the vector specifies the kernel function that will be applied to all GP nodes in the corresponding layer.
#'
-#' Defaults to `"sexp"`.
+#' Defaults to `"sexp"`.
#' @param lengthscale initial lengthscales for GP nodes in the DGP emulator. It can be a single numeric value or a vector:
#' 1. if it is a single numeric value, the value will be applied as the initial lengthscales for all GP nodes in the DGP hierarchy.
#' 2. if it is a vector, each element of the vector specifies the initial lengthscales that will be applied to all GP nodes in the corresponding layer.
#' The vector should have a length of `depth` if `likelihood = NULL` or a length of `depth - 1` if `likelihood` is not `NULL`.
#'
-#' Defaults to a numeric value of `1.0`.
+#' Defaults to a numeric value of `1.0`.
#' @param bounds the lower and upper bounds of lengthscales in GP nodes. It can be a vector or a matrix:
#' 1. if it is a vector, the lower bound (the first element of the vector) and upper bound (the second element of the vector) will be applied to
#' lengthscales for all GP nodes in the DGP hierarchy.
#' 2. if it is a matrix, each row of the matrix specifies the lower and upper bounds of lengthscales for all GP nodes in the corresponding layer.
#' The matrix should have its row number equal to `depth` if `likelihood = NULL` or to `depth - 1` if `likelihood` is not `NULL`.
#'
-#' Defaults to `NULL` where no bounds are specified for the lengthscales.
+#' Defaults to `NULL` where no bounds are specified for the lengthscales.
#' @param prior prior to be used for MAP estimation of lengthscales and nuggets of all GP nodes in the DGP hierarchy:
#' * gamma prior (`"ga"`),
#' * inverse gamma prior (`"inv_ga"`), or
#' * jointly robust prior (`"ref"`).
#'
-#' Defaults to `"ga"`.
+#' Defaults to `"ga"`.
#' @param share a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to `TRUE`.
#' @param nugget_est a bool or a bool vector that indicates if the nuggets of GP nodes (if any) in the final layer are to be estimated. If a single bool is
#' provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of `ncol(Y)`) is provided, each
@@ -67,12 +58,12 @@
#' * `FALSE`: the variance of the corresponding GP in the final layer is fixed to the corresponding value defined in `scale` (see below).
#' * `TRUE`: the variance of the corresponding GP in the final layer will be estimated with the initial value given by the correspondence in `scale` (see below).
#'
-#' Defaults to `TRUE`. This argument is only used when `struc = NULL`.
+#' Defaults to `TRUE`.
#' @param scale the initial variance value(s) of GP nodes (if any) in the final layer. If it is a single numeric value, it will be applied to all GP nodes (if any)
#' in the final layer. If it is a vector (which must have a length of `ncol(Y)`), each numeric in the vector will be applied to the corresponding GP node
-#' (if any) in the final layer. Defaults to `1`. This argument is only used when `struc = NULL`.
+#' (if any) in the final layer. Defaults to `1`.
#' @param connect a bool indicating whether to implement global input connection to the DGP structure. Setting it to `FALSE` may produce a better emulator in some cases at
-#' the cost of slower training. Defaults to `TRUE`. This argument is only used when `struc = NULL`.
+#' the cost of slower training. Defaults to `TRUE`.
#' @param likelihood the likelihood type of a DGP emulator:
#' 1. `NULL`: no likelihood layer is included in the emulator.
#' 2. `"Hetero"`: a heteroskedastic Gaussian likelihood layer is added for stochastic emulation where the computer model outputs are assumed to follow a heteroskedastic Gaussian distribution
@@ -84,7 +75,7 @@
#' When `likelihood` is not `NULL`, the value of `nugget_est` is overridden by `FALSE`. Defaults to `NULL`.
#' @param training a bool indicating if the initialized DGP emulator will be trained.
#' When set to `FALSE`, [dgp()] returns an untrained DGP emulator, to which one can apply [summary()] to inspect its specifications
-#' (especially when a customized `struc` is provided) or apply [predict()] to check its emulation performance before training. Defaults to `TRUE`.
+#' or apply [predict()] to check its emulation performance before training. Defaults to `TRUE`.
#' @param verb a bool indicating if the trace information on DGP emulator construction and training will be printed during the function execution.
#' Defaults to `TRUE`.
#' @param check_rep a bool indicating whether to check for repetitions in the dataset, i.e., if one input
@@ -109,14 +100,14 @@
#' @param burnin the number of training iterations to be discarded for
#' point estimates of model parameters. Must be smaller than the training iterations `N`. If this is not specified, only the last 25% of iterations
#' are used. Defaults to `NULL`. This argument is only used when `training = TRUE`.
-#' @param B the number of imputations used to produce predictions. Increase the value to refine the representation of imputation uncertainty.
+#' @param B the number of imputations used to produce predictions. Increase the value to refine the representation of imputation uncertainty.
#' Defaults to `10`.
#' @param internal_input_idx `r lifecycle::badge("deprecated")` The argument will be removed in the next release. To set up connections of emulators for linked emulations,
#' please use the updated [lgp()] function instead.
#'
#' Column indices of `X` that are generated by the linked emulators in the preceding layers.
#' Set `internal_input_idx = NULL` if the DGP emulator is in the first layer of a system or all columns in `X` are
-#' generated by the linked emulators in the preceding layers. Defaults to `NULL`. This argument is only used when `struc = NULL`.
+#' generated by the linked emulators in the preceding layers. Defaults to `NULL`.
#' @param linked_idx `r lifecycle::badge("deprecated")` The argument will be removed in the next release. To set up connections of emulators for linked emulation,
#' please use the updated [lgp()] function instead.
#'
@@ -164,9 +155,7 @@
#' with the light option `light = TRUE`) is loaded back to R by [read()].
#' 6. `B`: the number of imputations used to generate the emulator.
#' 7. `r new_badge("new")` `vecchia`: whether the Vecchia approximation is used for the GP emulator training.
-#' 8. `r new_badge("new")` `M`: the size of the conditioning set for the Vecchia approximation in the DGP emulator training.
-#'
-#' `internal_dims` and `external_dims` are generated only when `struc = NULL`. `M` is generated only when `vecchia = TRUE`.
+#' 8. `r new_badge("new")` `M`: the size of the conditioning set for the Vecchia approximation in the DGP emulator training. `M` is generated only when `vecchia = TRUE`.
#' * `constructor_obj`: a 'python' object that stores the information of the constructed DGP emulator.
#' * `container_obj`: a 'python' object that stores the information for the linked emulation.
#' * `emulator_obj`: a 'python' object that stores the information for the predictions from the DGP emulator.
@@ -185,7 +174,7 @@
#' * [update()] to update the DGP emulator with new inputs and outputs.
#' * [alm()], [mice()], and [vigf()] to locate next design points.
#'
-#' @details See further examples and tutorials at and learn how to customize a DGP structure.
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @note Any R vector detected in `X` and `Y` will be treated as a column vector and automatically converted into a single-column
#' R matrix. Thus, if `X` is a single data point with multiple dimensions, it must be given as a matrix.
#' @examples
@@ -241,7 +230,7 @@
#' }
#' @md
#' @export
-dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', lengthscale = 1.0, bounds = NULL, prior = 'ga', share = TRUE,
+dgp <- function(X, Y, depth = 2, node = ncol(X), name = 'sexp', lengthscale = 1.0, bounds = NULL, prior = 'ga', share = TRUE,
nugget_est = FALSE, nugget = NULL, scale_est = TRUE, scale = 1., connect = TRUE,
likelihood = NULL, training =TRUE, verb = TRUE, check_rep = TRUE, vecchia = FALSE, M = 25, ord = NULL, N = ifelse(vecchia, 200, 500), cores = 1, blocked_gibbs = TRUE,
ess_burn = 10, burnin = NULL, B = 10, internal_input_idx = NULL, linked_idx = NULL, id = NULL) {
@@ -250,23 +239,12 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
if (pkg.env$restart) return(invisible(NULL))
}
- if (!is.null(struc)) {
- # Display a combined warning message
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "dgp(struc)",
- details = c(i = "The argument will be dropped in the next release.",
- i = "To customize DGP specification, please adjust the other arguments in the `dgp()` function."
- )
- )
- }
-
if (!is.null(internal_input_idx)) {
lifecycle::deprecate_warn(
when = "2.5.0",
what = "dgp(internal_input_idx)",
details = c(i = "The argument will be dropped in the next release.",
- i = "To set up connections of GPs for linked emulation, please use the updated `lgp()` function instead."
+ i = "To set up connections of DGPs for linked emulation, please use the updated `lgp()` function instead."
)
)
}
@@ -276,7 +254,7 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
when = "2.5.0",
what = "dgp(linked_idx)",
details = c(i = "The argument will be dropped in the next release.",
- i = "To set up connections of GPs for linked emulation, please use the updated `lgp()` function instead."
+ i = "To set up connections of DGPs for linked emulation, please use the updated `lgp()` function instead."
)
)
}
@@ -306,12 +284,6 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
rank_num <- pkg.env$np$linalg$matrix_rank(X)
if (rank_num < n_dim_X) stop("The input matrix is not full rank. This indicates perfect multicollinearity and redundant information. We recommend identifying and removing redundant columns.")
- if ( is.null(struc) ) {
- is.null.struc <- TRUE
- } else {
- is.null.struc <- FALSE
- }
-
if ( !is.null(likelihood) ){
if (likelihood!='Hetero' & likelihood!='Poisson' & likelihood!='NegBin' & likelihood!='Categorical' ) stop("The provided 'likelihood' is not supported.", call. = FALSE)
}
@@ -341,7 +313,6 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
linked_idx_py <- linked_idx_r_to_py(linked_idx)
#If struc is NULL
- if ( is.null.struc ) {
depth <- as.integer(depth)
if ( depth < 2 ) stop("'depth' must >= 2. Use gp() if you want a single-layered DGP.", call. = FALSE)
@@ -623,7 +594,7 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
message(" done")
Sys.sleep(0.5)
}
- }
+
if ( isTRUE(verb) ) message("Initializing the DGP emulator ...", appendLF = FALSE)
@@ -658,10 +629,8 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
res[['data']][['X']] <- unname(X)
res[['data']][['Y']] <- unname(Y)
res[['specs']] <- extract_specs(est_obj, "dgp")
- if ( is.null.struc ) {
- res[['specs']][['internal_dims']] <- if( is.null(internal_input_idx) ) 1:n_dim_X else as.integer(reticulate::py_to_r(internal_input_idx)+1)
- res[['specs']][['external_dims']] <- if( is.null(internal_input_idx) ) FALSE else as.integer(reticulate::py_to_r(external_input_idx)+1)
- }
+ res[['specs']][['internal_dims']] <- if( is.null(internal_input_idx) ) 1:n_dim_X else as.integer(reticulate::py_to_r(internal_input_idx)+1)
+ res[['specs']][['external_dims']] <- if( is.null(internal_input_idx) ) FALSE else as.integer(reticulate::py_to_r(external_input_idx)+1)
res[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx
res[['specs']][['vecchia']] <- vecchia
res[['specs']][['M']] <- M
@@ -692,9 +661,7 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
#' GP components in different layers and optimization of GP components is computationally expensive. Defaults to `1`.
#' @param ess_burn number of burnin steps for ESS-within-Gibbs
#' at each I-step of the training. Defaults to `10`.
-#' @param verb a bool indicating if a progress bar will be printed during training:
-#'
-#' Defaults to `TRUE`.
+#' @param verb a bool indicating if a progress bar will be printed during training. Defaults to `TRUE`.
#' @param burnin the number of training iterations to be discarded for
#' point estimates calculation. Must be smaller than the overall training iterations
#' so-far implemented. If this is not specified, only the last 25% of iterations
@@ -705,7 +672,7 @@ dgp <- function(X, Y, struc = NULL, depth = 2, node = ncol(X), name = 'sexp', le
#'
#' @return An updated `object`.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @note
#' * One can also use this function to fit an untrained DGP emulator constructed by [dgp()] with `training = FALSE`.
#' * The following slots:
@@ -740,7 +707,7 @@ continue <- function(object, N = NULL, cores = 1, ess_burn = 10, verb = TRUE, bu
if( !is.null(cores) ) {
cores <- as.integer(cores)
- if ( cores < 1 ) stop("cores must be >= 1.", call. = FALSE)
+ if ( cores < 1 ) stop("'cores' must be >= 1.", call. = FALSE)
}
if ( is.null(B) ){
@@ -777,10 +744,8 @@ continue <- function(object, N = NULL, cores = 1, ess_burn = 10, verb = TRUE, bu
new_object[['data']][['X']] <- object$data$X
new_object[['data']][['Y']] <- object$data$Y
new_object[['specs']] <- extract_specs(est_obj, "dgp")
- if ("internal_dims" %in% names(object[['specs']])){
- new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
- new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
- }
+ new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
+ new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
new_object[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx_py_to_r(linked_idx)
new_object[['specs']][['vecchia']] <- object[['specs']][['vecchia']]
new_object[['specs']][['M']] <- object[['specs']][['M']]
diff --git a/R/draw.R b/R/draw.R
index 74d4238..6b12d43 100644
--- a/R/draw.R
+++ b/R/draw.R
@@ -11,7 +11,7 @@
#' - `"design"`: shows visualizations of input designs created by the sequential design procedure.
#'
#' Defaults to `"rmse"`.
-#' @param log a boolean indicating whether to plot RMSEs, log-losses (for DGP emulators with categorical likelihoods), or custom evaluation metrics on a log scale when `type = "rmse"`.
+#' @param log a bool indicating whether to plot RMSEs, log-losses (for DGP emulators with categorical likelihoods), or custom evaluation metrics on a log scale when `type = "rmse"`.
#' Defaults to `FALSE`.
#' @param emulator an index or vector of indices of emulators packed in `object`. This argument is only used if `object` is an instance of the `bundle` class. When set to `NULL`, all
#' emulators in the bundle are drawn. Defaults to `NULL`.
@@ -19,7 +19,7 @@
#'
#' @return A `patchwork` object.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/gp.R b/R/gp.R
index 0e9749d..08803da 100644
--- a/R/gp.R
+++ b/R/gp.R
@@ -4,11 +4,6 @@
#'
#' @param X a matrix where each row is an input data point and each column is an input dimension.
#' @param Y a matrix with only one column and each row being an output data point.
-#' @param struc `r lifecycle::badge("deprecated")` an object produced by [kernel()] that gives a user-defined GP specification. When `struc = NULL`,
-#' the GP specifications are automatically generated using information provided in `name`, `lengthscale`,
-#' `nugget_est`, `nugget`, `scale_est`, `scale`,and `internal_input_idx`. Defaults to `NULL`.
-#'
-#' **The argument will be removed in the next release. To customize GP specifications, please adjust the other arguments in the [gp()] function.**
#' @param name kernel function to be used. Either `"sexp"` for squared exponential kernel or
#' `"matern2.5"` for Matérn-2.5 kernel. Defaults to `"sexp"`.
#' @param lengthscale initial values of lengthscales in the kernel function. It can be a single numeric value or a vector of length `ncol(X)`:
@@ -26,12 +21,12 @@
#' 1. `FALSE`: the nugget term is fixed to `nugget`.
#' 2. `TRUE`: the nugget term will be estimated.
#'
-#' Defaults to `FALSE`. This argument is only used when `struc = NULL`.
+#' Defaults to `FALSE`.
#' @param nugget the initial nugget value. If `nugget_est = FALSE`, the assigned value is fixed during the training.
#' Set `nugget` to a small value (e.g., `1e-8`) and the corresponding bool in `nugget_est` to `FALSE` for deterministic computer models where the emulator
#' should interpolate the training data points. Set `nugget` to a larger value and the corresponding bool in `nugget_est` to `TRUE` for stochastic
#' emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to `1e-8` if `nugget_est = FALSE` and
-#' `0.01` if `nugget_est = TRUE`. This argument is only used when `struc = NULL`.
+#' `0.01` if `nugget_est = TRUE`.
#' @param scale_est a bool indicating if the variance is to be estimated:
#' 1. `FALSE`: the variance is fixed to `scale`.
#' 2. `TRUE`: the variance term will be estimated.
@@ -94,8 +89,6 @@
#' **The slot will be removed in the next release**.
#' 8. `r new_badge("new")` `vecchia`: whether the Vecchia approximation is used for the GP emulator training.
#' 9. `r new_badge("new")` `M`: the size of the conditioning set for the Vecchia approximation in the GP emulator training.
-#'
-#' `internal_dims` and `external_dims` are generated only when `struc = NULL`.
#' * `constructor_obj`: a 'python' object that stores the information of the constructed GP emulator.
#' * `container_obj`: a 'python' object that stores the information for the linked emulation.
#' * `emulator_obj`: a 'python' object that stores the information for the predictions from the GP emulator.
@@ -114,7 +107,7 @@
#' @references
#' - Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. *Bayesian Analysis*, **14(3)**, 857-885.
#' - Katzfuss, M., Guinness, J., & Lawrence, E. (2022). Scaled Vecchia approximation for fast computer-model emulation. *SIAM/ASA Journal on Uncertainty Quantification*, **10(2)**, 537-554.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @note Any R vector detected in `X` and `Y` will be treated as a column vector and automatically converted into a single-column
#' R matrix. Thus, if `X` is a single data point with multiple dimensions, it must be given as a matrix.
#' @examples
@@ -158,23 +151,12 @@
#'
#' @md
#' @export
-gp <- function(X, Y, struc = NULL, name = 'sexp', lengthscale = rep(0.1, ncol(X)), bounds = NULL, prior = 'ref', nugget_est = FALSE, nugget = ifelse(nugget_est, 0.01, 1e-8), scale_est = TRUE, scale = 1., training = TRUE, verb = TRUE, vecchia = FALSE, M = 25, ord = NULL, internal_input_idx = NULL, linked_idx = NULL, id = NULL) {
+gp <- function(X, Y, name = 'sexp', lengthscale = rep(0.1, ncol(X)), bounds = NULL, prior = 'ref', nugget_est = FALSE, nugget = ifelse(nugget_est, 0.01, 1e-8), scale_est = TRUE, scale = 1., training = TRUE, verb = TRUE, vecchia = FALSE, M = 25, ord = NULL, internal_input_idx = NULL, linked_idx = NULL, id = NULL) {
if ( is.null(pkg.env$dgpsi) ) {
init_py(verb = F)
if (pkg.env$restart) return(invisible(NULL))
}
- if (!is.null(struc)) {
- # Display a combined warning message
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "gp(struc)",
- details = c(i = "The argument will be dropped in the next release.",
- i = "To customize GP specification, please adjust the other arguments in the `gp()` function."
- )
- )
- }
-
if (!is.null(internal_input_idx)) {
# Display a combined warning message
lifecycle::deprecate_warn(
@@ -220,57 +202,50 @@ gp <- function(X, Y, struc = NULL, name = 'sexp', lengthscale = rep(0.1, ncol(X)
ord_wrapper <- NULL
}
- if ( is.null(struc) ) {
- is.null.struc <- TRUE
- } else {
- is.null.struc <- FALSE
- }
-
if ( name!='sexp' & name!='matern2.5' ) stop("'name' can only be either 'sexp' or 'matern2.5'.", call. = FALSE)
if ( prior!='ga' & prior!='inv_ga' & prior!='ref') stop("'prior' can only be 'ga', 'inv_ga', or 'ref'.", call. = FALSE)
linked_idx_py <- linked_idx_r_to_py(linked_idx)
- if ( is.null.struc ) {
- if ( verb ) message("Auto-generating a GP structure ...", appendLF = FALSE)
+ if ( verb ) message("Auto-generating a GP structure ...", appendLF = FALSE)
- if ( length(lengthscale) != 1 & length(lengthscale) != n_dim_X) {
- stop("length(lengthscale) must be 1 or ncol(X).", call. = FALSE)
- }
+ if ( length(lengthscale) != 1 & length(lengthscale) != n_dim_X) {
+ stop("length(lengthscale) must be 1 or ncol(X).", call. = FALSE)
+ }
- if ( !is.null(bounds) ){
- if ( !is.vector(bounds) ) {
- bounds <- as.vector(bounds)
- }
- if ( length(bounds)!=2 ) {
- stop(sprintf("length(bounds) must equal to %i.", 2), call. = FALSE)
- }
- if ( bounds[1]>bounds[2] ) stop("The second element of 'bounds' must be greater than the first.", call. = FALSE)
- bounds <- reticulate::np_array(bounds)
+ if ( !is.null(bounds) ){
+ if ( !is.vector(bounds) ) {
+ bounds <- as.vector(bounds)
+ }
+ if ( length(bounds)!=2 ) {
+ stop(sprintf("length(bounds) must equal to %i.", 2), call. = FALSE)
}
+ if ( bounds[1]>bounds[2] ) stop("The second element of 'bounds' must be greater than the first.", call. = FALSE)
+ bounds <- reticulate::np_array(bounds)
+ }
- if( !is.null(internal_input_idx) ) {
- external_input_idx <- setdiff(1:n_dim_X, internal_input_idx)
- if ( length(external_input_idx) == 0) {
- internal_input_idx = NULL
- external_input_idx = NULL
- } else {
- internal_input_idx <- reticulate::np_array(as.integer(internal_input_idx - 1))
- external_input_idx <- reticulate::np_array(as.integer(external_input_idx - 1))
- }
- } else {
+ if( !is.null(internal_input_idx) ) {
+ external_input_idx <- setdiff(1:n_dim_X, internal_input_idx)
+ if ( length(external_input_idx) == 0) {
+ internal_input_idx = NULL
external_input_idx = NULL
+ } else {
+ internal_input_idx <- reticulate::np_array(as.integer(internal_input_idx - 1))
+ external_input_idx <- reticulate::np_array(as.integer(external_input_idx - 1))
}
+ } else {
+ external_input_idx = NULL
+ }
- struc <- pkg.env$dgpsi$kernel(length = reticulate::np_array(lengthscale), name = name, prior_name = prior, bds = bounds, scale = scale, scale_est = scale_est, nugget = nugget, nugget_est = nugget_est,
- input_dim = internal_input_idx, connect = external_input_idx)
+ struc <- pkg.env$dgpsi$kernel(length = reticulate::np_array(lengthscale), name = name, prior_name = prior, bds = bounds, scale = scale, scale_est = scale_est, nugget = nugget, nugget_est = nugget_est,
+ input_dim = internal_input_idx, connect = external_input_idx)
- if ( verb ) {
- message(" done")
- Sys.sleep(0.5)
- }
+ if ( verb ) {
+ message(" done")
+ Sys.sleep(0.5)
}
+
if ( verb ) message("Initializing the GP emulator ...", appendLF = FALSE)
obj <- pkg.env$dgpsi$gp(X, Y, struc, vecchia, M, ord_wrapper)
@@ -294,10 +269,8 @@ gp <- function(X, Y, struc = NULL, name = 'sexp', lengthscale = rep(0.1, ncol(X)
res[['data']][['X']] <- unname(X)
res[['data']][['Y']] <- unname(Y)
res[['specs']] <- extract_specs(obj, "gp")
- if ( is.null.struc ) {
- res[['specs']][['internal_dims']] <- if( is.null(internal_input_idx) ) 1:n_dim_X else as.integer(reticulate::py_to_r(internal_input_idx)+1)
- res[['specs']][['external_dims']] <- if( is.null(internal_input_idx) ) FALSE else as.integer(reticulate::py_to_r(external_input_idx)+1)
- }
+ res[['specs']][['internal_dims']] <- if( is.null(internal_input_idx) ) 1:n_dim_X else as.integer(reticulate::py_to_r(internal_input_idx)+1)
+ res[['specs']][['external_dims']] <- if( is.null(internal_input_idx) ) FALSE else as.integer(reticulate::py_to_r(external_input_idx)+1)
res[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx
res[['specs']][['vecchia']] <- vecchia
res[['specs']][['M']] <- M
diff --git a/R/initi_py.R b/R/initi_py.R
index cbe196b..302032d 100644
--- a/R/initi_py.R
+++ b/R/initi_py.R
@@ -30,7 +30,7 @@ pkg.env$dill <- NULL
#'
#' @return No return value, called to install required 'python' environment.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/kernel.R b/R/kernel.R
deleted file mode 100644
index 9ab9500..0000000
--- a/R/kernel.R
+++ /dev/null
@@ -1,104 +0,0 @@
-#' @title Initialize a Gaussian process node
-#'
-#' @description
-#'
-#' `r lifecycle::badge("deprecated")`
-#'
-#' This function is deprecated and will be removed in the next release. To customize
-#' DGP specifications, adjust the other arguments in the `dgp()` function instead.
-#'
-#' @param length a vector of lengthscales. The length of the vector equals to:
-#' 1. either one if the lengthscales in the kernel function are assumed same across input dimensions; or
-#' 2. the total number of input dimensions, which is the sum of the number of feeding GP nodes
-#' in the last layer (defined by the argument `input_dim`) and the number of connected global
-#' input dimensions (defined by the argument `connect`), if the lengthscales in the kernel function
-#' are assumed different across input dimensions.
-#' @param scale the variance of a GP node. Defaults to `1`.
-#' @param nugget the nugget term of a GP node. Defaults to `1e-6`.
-#' @param name kernel function to be used. Either `"sexp"` for squared exponential kernel or
-#' `"matern2.5"` for Matérn-2.5 kernel. Defaults to `"sexp"`.
-#' @param prior_name prior options for the lengthscales and nugget term: gamma prior (`"ga"`), inverse gamma prior (`"inv_ga"`),
-#' or jointly robust prior (`"ref"`) for the lengthscales and nugget term. Set `NULL` to disable the prior. Defaults to `"ga"`.
-#' @param prior_coef a vector that contains the coefficients for different priors:
-#' * for the gamma prior, it is a vector of two values specifying the shape and rate parameters of the gamma distribution. Set to `NULL` for the
-#' default value `c(1.6,0.3)`.
-#' * for the inverse gamma prior, it is a vector of two values specifying the shape and scale parameters of the inverse gamma distribution. Set
-#' to `NULL` for the default value `c(1.6,0.3)`.
-#' * for the jointly robust prior, it is a vector of a single value specifying the `a` parameter in the prior. Set to `NULL` for the
-#' default value `c(0.2)`. See the reference below for the jointly robust prior.
-#'
-#' Defaults to `NULL`.
-#' @param bounds a vector of length two that gives the lower bound (the first element of the vector) and the upper bound (the second element of the
-#' vector) of all lengthscales of the GP node. Defaults to `NULL` where no bounds are specified for the lengthscales.
-#' @param nugget_est set to `TRUE` to estimate the nugget term or to `FALSE` to fix the nugget term as specified
-#' by the argument `nugget`. If set to `TRUE`, the value set to the argument `nugget` is used as the initial
-#' value. Defaults to `FALSE`.
-#' @param scale_est set to `TRUE` to estimate the variance (i.e., scale) or to `FALSE` to fix the variance (i.e., scale) as specified
-#' by the argument `scale`. Defaults to `FALSE`.
-#' @param input_dim a vector that contains either
-#' 1. the indices of GP nodes in the feeding layer whose outputs feed into this GP node; or
-#' 2. the indices of global input dimensions that are linked to the outputs of some feeding emulators,
-#' if this GP node is in the first layer of a GP or DGP, which will be used for the linked emulation.
-#'
-#' When set to `NULL`,
-#' 1. all outputs from the GP nodes in the feeding layer feed into this GP node; or
-#' 2. all global input dimensions feed into this GP node.
-#'
-#' Defaults to `NULL`.
-#' @param connect a vector that contains the indices of dimensions in the global
-#' input connecting to this GP node as additional input dimensions. When set to `NULL`, no global input
-#' connection is implemented. Defaults to `NULL`. When this GP node is in the first layer of a GP or DGP emulator,
-#' which will consequently be used for linked emulation, `connect` gives the indices of global input dimensions
-#' that are not connected to some feeding emulators. In such a case, set `input_dim` to a vector of indices of
-#' the remaining input dimensions that are connected to the feeding emulators.
-#'
-#' @return A 'python' object to represent a GP node.
-#' @references
-#' Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. *Bayesian Analysis*, **14(3)**, 857-885.
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # Check https://mingdeyu.github.io/dgpsi-R/ for examples
-#' # on how to customize DGP structures using kernel().
-#' }
-#' @md
-#' @keywords internal
-#' @export
-kernel <- function(length, scale = 1., nugget = 1e-6, name = 'sexp', prior_name = 'ga', prior_coef = NULL, bounds = NULL, nugget_est = FALSE, scale_est = FALSE, input_dim = NULL, connect = NULL) {
- if ( is.null(pkg.env$dgpsi) ) {
- init_py(verb = F)
- if (pkg.env$restart) return(invisible(NULL))
- }
-
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "kernel()",
- details = c(i = "The function will be removed in the next release.",
- i= "It may not be compatible with other functions in this version.",
- i = "Please adjust the other arguments in `dgp()` function to customize DGP specifications."
- )
- )
-
- if ( name!='sexp' & name!='matern2.5' ) stop("'name' can only be either 'sexp' or 'matern2.5'.", call. = FALSE)
- if ( !is.null(prior_name) & prior_name!='ga' & prior_name!='inv_ga' ) stop("The provided 'prior_name' is not supported.", call. = FALSE)
-
- if(!is.null(input_dim)){
- input_dim <- reticulate::np_array(as.integer(input_dim - 1))
- }
-
- if(!is.null(connect)){
- connect <- reticulate::np_array(as.integer(connect - 1))
- }
-
- if(!is.null(bounds)){
- bounds <- reticulate::np_array(bounds)
- }
-
- if(!is.null(prior_coef)){
- prior_coef <- reticulate::np_array(prior_coef)
- }
-
- res <- pkg.env$dgpsi$kernel(reticulate::np_array(length), scale, nugget, name, prior_name, prior_coef, bounds, nugget_est, scale_est, input_dim, connect)
- return(res)
-}
diff --git a/R/lgp.R b/R/lgp.R
index e802681..c4f91c2 100644
--- a/R/lgp.R
+++ b/R/lgp.R
@@ -32,9 +32,9 @@
#' If the same emulator is used multiple times within the linked system, the list must contain distinct copies
#' of that emulator, each with a unique ID stored in their `id` slot. Use the [set_id()] function to produce copies with different IDs
#' to ensure each instance can be uniquely referenced.
-#' @param Bthe number of imputations used for prediction. Increase the value to refine representation of
+#' @param B the number of imputations used for prediction. Increase the value to refine representation of
#' imputation uncertainty. If the system consists of only GP emulators, `B` is set to `1` automatically. Defaults to `10`.
-#' @param activate `r new_badge("new")` a boolean indicating whether the initialized linked emulator should be activated:
+#' @param activate `r new_badge("new")` a bool indicating whether the initialized linked emulator should be activated:
#' - If `activate = FALSE`, [lgp()] returns an inactive linked emulator, allowing inspection of its structure using [summary()].
#' - If `activate = TRUE`, [lgp()] returns an active linked emulator, ready for prediction and validation using [predict()] and [validate()], respectively.
#'
@@ -72,7 +72,7 @@
#' * [summary()] to summarize the constructed linked (D)GP emulator.
#' * [write()] to save the linked (D)GP emulator to a `.pkl` file.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/likelihood.R b/R/likelihood.R
deleted file mode 100644
index 7f27f54..0000000
--- a/R/likelihood.R
+++ /dev/null
@@ -1,150 +0,0 @@
-#' @title Initialize a Poisson likelihood node
-#'
-#' @description
-#'
-#' `r lifecycle::badge("deprecated")`
-#'
-#' This function is deprecated and will be removed in the next release.
-#' To incorporate a Poisson likelihood node into a DGP structure,
-#' use the `likelihood` argument in the `dgp()` function instead.
-#'
-#' @param input_dim a vector of length one that contains the indices of one GP node in the feeding
-#' layer whose outputs feed into this likelihood node. When set to `NULL`,
-#' all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-#' one needs to ensure that only one GP node is specified in the feeding layer.
-#' Defaults to `NULL`.
-#'
-#' @return A 'python' object to represent a Poisson likelihood node.
-#' @note The Poisson likelihood node can only be linked to one feeding GP node.
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # Check https://mingdeyu.github.io/dgpsi-R/ for examples
-#' # on how to customize DGP structures using Poisson().
-#' }
-#' @md
-#' @keywords internal
-#' @export
-Poisson <- function(input_dim = NULL) {
- if ( is.null(pkg.env$dgpsi) ) {
- init_py(verb = F)
- if (pkg.env$restart) return(invisible(NULL))
- }
-
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "kernel()",
- details = c(i = "The function will be removed in the next release.",
- i= "It may not be compatible with other functions in this version.",
- i = "Please use the `likelihood` argument in `dgp()` function to incorporate a Poisson likelihood node into a DGP structure."
- )
- )
-
- if(!is.null(input_dim)){
- input_dim <- reticulate::np_array(as.integer(input_dim - 1))
- }
- res <- pkg.env$dgpsi$Poisson(input_dim)
- return(res)
-}
-
-
-#' @title Initialize a heteroskedastic Gaussian likelihood node
-#'
-#' @description
-#'
-#' `r lifecycle::badge("deprecated")`
-#'
-#' This function is deprecated and will be removed in the next release.
-#' To incorporate a heteroskedastic Gaussian likelihood node into a DGP structure,
-#' use the `likelihood` argument in the `dgp()` function instead.
-#'
-#' @param input_dim a vector of length two that contains the indices of two GP nodes in the feeding
-#' layer whose outputs feed into this likelihood node. When set to `NULL`,
-#' all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-#' one needs to ensure that only two GP nodes are specified in the feeding layer.
-#' Defaults to `NULL`.
-#'
-#' @return A 'python' object to represent a heteroskedastic Gaussian likelihood node.
-#' @note The heteroskedastic Gaussian likelihood node can only be linked to two feeding GP nodes.
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # Check https://mingdeyu.github.io/dgpsi-R/ for examples
-#' # on how to customize DGP structures using Hetero().
-#' }
-#' @md
-#' @keywords internal
-#' @export
-Hetero <- function(input_dim = NULL) {
- if ( is.null(pkg.env$dgpsi) ) {
- init_py(verb = F)
- if (pkg.env$restart) return(invisible(NULL))
- }
-
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "kernel()",
- details = c(i = "The function will be removed in the next release.",
- i= "It may not be compatible with other functions in this version.",
- i = "Please use the `likelihood` argument in `dgp()` function to incorporate a heteroskedastic Gaussian likelihood node into a DGP structure."
- )
- )
-
- if(!is.null(input_dim)){
- input_dim <- reticulate::np_array(as.integer(input_dim - 1))
- }
- res <- pkg.env$dgpsi$Hetero(input_dim)
- return(res)
-}
-
-#' @title Initialize a negative Binomial likelihood node
-#'
-#' @description
-#'
-#' `r lifecycle::badge("deprecated")`
-#'
-#' This function is deprecated and will be removed in the next release.
-#' To incorporate a negative Binomial likelihood node into a DGP structure,
-#' use the `likelihood` argument in the `dgp()` function instead.
-#'
-#' @param input_dim a vector of length two that contains the indices of two GP nodes in the feeding
-#' layer whose outputs feed into this likelihood node. When set to `NULL`,
-#' all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-#' one needs to ensure that only two GP nodes are specified in the feeding layer.
-#' Defaults to `NULL`.
-#'
-#' @return A 'python' object to represent a negative Binomial likelihood node.
-#' @note The negative Binomial likelihood node can only be linked to two feeding GP nodes.
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # Check https://mingdeyu.github.io/dgpsi-R/ for examples
-#' # on how to customize DGP structures using NegBin().
-#' }
-#' @md
-#' @keywords internal
-#' @export
-NegBin <- function(input_dim = NULL) {
- if ( is.null(pkg.env$dgpsi) ) {
- init_py(verb = F)
- if (pkg.env$restart) return(invisible(NULL))
- }
-
- lifecycle::deprecate_warn(
- when = "2.5.0",
- what = "kernel()",
- details = c(i = "The function will be removed in the next release.",
- i= "It may not be compatible with other functions in this version.",
- i = "Please use the `likelihood` argument in `dgp()` function to incorporate a negative Binomial likelihood node into a DGP structure."
- )
- )
-
- if(!is.null(input_dim)){
- input_dim <- reticulate::np_array(as.integer(input_dim - 1))
- }
- res <- pkg.env$dgpsi$NegBin(input_dim)
- return(res)
-}
diff --git a/R/mice.R b/R/mice.R
index 91a0478..45e4f3c 100644
--- a/R/mice.R
+++ b/R/mice.R
@@ -68,7 +68,7 @@
#' Beck, J., & Guillas, S. (2016). Sequential design with mutual information for computer experiments (MICE): emulation of a tsunami model.
#' *SIAM/ASA Journal on Uncertainty Quantification*, **4(1)**, 739-766.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/plot.R b/R/plot.R
index 2e8b095..ba65f95 100644
--- a/R/plot.R
+++ b/R/plot.R
@@ -59,9 +59,9 @@
#' it is recommended to first run [validate()] to obtain and store validation results in the emulator object, and then supply the
#' object to [plot()]. [plot()] checks the object's `loo` and `oos` slots prior to calling [validate()] and will not perform further calculation if the required information is already stored.
#' * [plot()] will only use stored OOS validation if `x_test` and `y_test` are identical to those used by [validate()] to produce the data contained in the object's `oos` slot, otherwise [plot()] will re-evaluate OOS validation before plotting.
-#' * The returned `patchwork` object contains the `ggplot2` objects. One can modify the included individual ggplots
+#' * The returned [patchwork] object contains the [ggplot2] objects. One can modify the included individual ggplots
#' by accessing them with double-bracket indexing. See for further information.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/prediction.R b/R/prediction.R
index 6c2f764..0118d0f 100644
--- a/R/prediction.R
+++ b/R/prediction.R
@@ -121,7 +121,7 @@
#' * `r new_badge("new")` the value of `M`, which represents the size of the conditioning set for the Vecchia approximation, if used, in the emulator prediction.
#' * the value of `sample_size` if `method = "sampling"`.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/serialization.R b/R/serialization.R
index 3c379cc..bd5bf02 100644
--- a/R/serialization.R
+++ b/R/serialization.R
@@ -8,7 +8,7 @@
#'
#' @return A serialized version of `object`.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @note Since the constructed emulators are 'python' objects, they cannot be directly exported to other R processes for parallel
#' processing in multi-session workers. This function provides a way to convert the emulators into serialized objects, which can be
#' restored using [deserialize()] for multi-session processing.
@@ -104,7 +104,7 @@ serialize <- function(object, light = TRUE) {
#'
#' @return The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bundle of (D)GP emulators.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
diff --git a/R/update.R b/R/update.R
index 0672b53..d93e645 100644
--- a/R/update.R
+++ b/R/update.R
@@ -38,7 +38,7 @@
#' - `design` created by [design()]
#'
#' in `object` will be removed and not contained in the returned object.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -145,10 +145,8 @@ update.dgp <- function(object, X, Y, refit = TRUE, reset = FALSE, verb = TRUE, N
new_object[['data']][['X']] <- unname(X)
new_object[['data']][['Y']] <- unname(Y)
new_object[['specs']] <- extract_specs(est_obj, "dgp")
- if ("internal_dims" %in% names(object[['specs']])){
- new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
- new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
- }
+ new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
+ new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
new_object[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx_py_to_r(linked_idx)
new_object[['specs']][['vecchia']] <- object[['specs']][['vecchia']]
new_object[['specs']][['M']] <- object[['specs']][['M']]
@@ -222,10 +220,8 @@ update.gp <- function(object, X, Y, refit = TRUE, reset = FALSE, verb = TRUE, ..
new_object[['data']][['X']] <- unname(X)
new_object[['data']][['Y']] <- unname(Y)
new_object[['specs']] <- extract_specs(constructor_obj_cp, "gp")
- if ("internal_dims" %in% names(object[['specs']])){
- new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
- new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
- }
+ new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
+ new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
new_object[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx_py_to_r(linked_idx)
new_object[['specs']][['vecchia']] <- object[['specs']][['vecchia']]
new_object[['specs']][['M']] <- object[['specs']][['M']]
diff --git a/R/utils.R b/R/utils.R
index 7cbf9e7..f00713a 100644
--- a/R/utils.R
+++ b/R/utils.R
@@ -9,21 +9,12 @@
#' please use the updated [lgp()] function, which provides a simpler and more efficient
#' approach to building (D)GP emulators.
#'
-#' @param ... a sequence of lists:
-#' 1. For DGP emulations, each list represents a DGP layer and contains GP nodes (produced by [kernel()]), or
-#' likelihood nodes (produced by [Poisson()], [Hetero()], or [NegBin()].
-#' 2. For linked (D)GP emulations, each list represents a system layer and contains emulators (produced by [gp()] or
+#' @param ... a sequence of lists. Each list represents a system layer and contains emulators (produced by [gp()] or
#' [dgp()]) in that layer.
#'
-#' @return A list defining a DGP structure (for `struc` of [dgp()]) or a linked (D)GP structure
-#' (for `struc` for [lgp()]).
+#' @return A list defining a linked (D)GP structure to be passed to `struc` of [lgp()].
#'
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # See lgp() for an example.
-#' }
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @md
#' @keywords internal
#' @export
@@ -56,7 +47,7 @@ combine <- function(...) {
#' training input data for different emulators. `Y` contains *N* single-column matrices named `emulator1,...,emulatorN` that are
#' training output data for different emulators.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -171,7 +162,7 @@ pack <- function(..., id = NULL) {
#' @return A named list that contains individual emulators (named `emulator1,...,emulatorS`) packed in `object`,
#' where `S` is the number of emulators in `object`.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -215,7 +206,7 @@ unpack <- function(object) {
#'
#' @return No return value. `object` will be saved to a local `.pkl` file specified by `pkl_file`.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @note Since emulators built from the package are 'python' objects, [save()] from R will not work as it would for R objects. If `object`
#' was processed by [set_vecchia()] to add or remove the Vecchia approximation, `light` should be set to `FALSE` to ensure
#' reproducibility after the saved emulator is reloaded by [read()].
@@ -272,7 +263,7 @@ write <- function(object, pkl_file, light = TRUE) {
#'
#' @return No return value.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -306,7 +297,7 @@ set_seed <- function(seed) {
#'
#' @return The updated `object`, with the assigned ID stored in its `id` slot.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -337,7 +328,7 @@ set_id <- function(object, id) {
#'
#' @return No return value.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @md
#' @export
set_thread_num <- function(num) {
@@ -361,7 +352,7 @@ set_thread_num <- function(num) {
#'
#' @return the number of threads.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @md
#' @export
get_thread_num <- function() {
@@ -381,7 +372,7 @@ get_thread_num <- function() {
#'
#' @return The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bundle of (D)GP emulators.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -619,7 +610,7 @@ read <- function(pkl_file) {
#' documents and the RStudio Viewer. The summary table can be further customized by [kableExtra] package.
#' The resulting [visNetwork] object can be saved as an HTML file using [visNetwork::visSave()].
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1461,13 +1452,13 @@ summary.lgp <- function(object, type = "plot", group_size = 1, ...) {
#' constructed by [gp()], [dgp()] or [lgp()].
#'
#' @param object an instance of the S3 class `gp`, `dgp`, or `lgp`.
-#' @param vecchia a boolean or a list of booleans to indicate the addition or removal of the Vecchia approximation:
-#' * if `object` is an instance of the `gp` or `dgp` class, `vecchia` is a boolean that indicates
+#' @param vecchia a bool or a list of bools to indicate the addition or removal of the Vecchia approximation:
+#' * if `object` is an instance of the `gp` or `dgp` class, `vecchia` is a bool that indicates
#' either addition (`vecchia = TRUE`) or removal (`vecchia = FALSE`) of the Vecchia approximation from `object`.
-#' * if `object` is an instance of the `lgp` class, `x` can be a boolean or a list of booleans:
-#' - if `vecchia` is a boolean, it indicates either addition (`vecchia = TRUE`) or removal (`vecchia = FALSE`) of
+#' * if `object` is an instance of the `lgp` class, `x` can be a bool or a list of bools:
+#' - if `vecchia` is a bool, it indicates either addition (`vecchia = TRUE`) or removal (`vecchia = FALSE`) of
#' the Vecchia approximation from all individual (D)GP emulators contained in `object`.
-#' - if `vecchia` is a list of booleans, it should have same shape as `struc` that was supplied to [lgp()]. Each boolean
+#' - if `vecchia` is a list of bools, it should have same shape as `struc` that was supplied to [lgp()]. Each bool
#' in the list indicates if the corresponding (D)GP emulator contained in `object` shall have the Vecchia approximation
#' added or removed.
#' @param M the size of the conditioning set for the Vecchia approximation in the (D)GP emulator training. Defaults to `25`.
@@ -1485,7 +1476,7 @@ summary.lgp <- function(object, type = "plot", group_size = 1, ...) {
#' without the need to reconstruct the emulator. If the emulator was built without the Vecchia approximation, the function
#' can add it, and if the emulator was built with the Vecchia approximation, the function can remove it. If the current
#' state already matches the requested state, the emulator remains unchanged.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @md
#' @export
set_vecchia <- function(object, vecchia = TRUE, M = 25, ord = NULL) {
@@ -1579,7 +1570,7 @@ set_vecchia <- function(object, vecchia = TRUE, M = 25, ord = NULL) {
#' even without knowing how different emulators are connected together. When this information is available and
#' different emulators are collected, the connection information between emulators can then be assigned to
#' individual emulators with this function.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1624,7 +1615,7 @@ set_linked_idx <- function(object, idx) {
#' - `loo` and `oos` created by [validate()]; and
#' - `results` created by [predict()]
#' in `object` will be removed and not contained in the returned object.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1653,10 +1644,8 @@ set_imp <- function(object, B = 5) {
new_object[['data']][['X']] <- object$data$X
new_object[['data']][['Y']] <- object$data$Y
new_object[['specs']] <- extract_specs(est_obj, "dgp")
- if ("internal_dims" %in% names(object[['specs']])){
- new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
- new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
- }
+ new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
+ new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
new_object[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx_py_to_r(linked_idx)
new_object[['specs']][['vecchia']] <- object[['specs']][['vecchia']]
new_object[['specs']][['M']] <- object[['specs']][['M']]
@@ -1697,7 +1686,7 @@ set_imp <- function(object, B = 5) {
#' - `loo` and `oos` created by [validate()]; and
#' - `results` created by [predict()]
#' in `object` will be removed and not contained in the returned object.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1753,10 +1742,8 @@ window <- function(object, start, end = NULL, thin = 1) {
new_object[['data']][['X']] <- object$data$X
new_object[['data']][['Y']] <- object$data$Y
new_object[['specs']] <- extract_specs(est_obj, "dgp")
- if ("internal_dims" %in% names(object[['specs']])){
- new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
- new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
- }
+ new_object[['specs']][['internal_dims']] <- object[['specs']][['internal_dims']]
+ new_object[['specs']][['external_dims']] <- object[['specs']][['external_dims']]
new_object[['specs']][['linked_idx']] <- if ( is.null(linked_idx) ) FALSE else linked_idx_py_to_r(linked_idx)
new_object[['specs']][['vecchia']] <- object[['specs']][['vecchia']]
new_object[['specs']][['M']] <- object[['specs']][['M']]
@@ -1775,7 +1762,7 @@ window <- function(object, start, end = NULL, thin = 1) {
}
-#' @title Calculate the negative log-likelihood
+#' @title Calculate the predictive negative log-likelihood
#'
#' @description This function computes the predictive negative log-likelihood from a
#' DGP emulator with a likelihood layer.
@@ -1789,14 +1776,7 @@ window <- function(object, start, end = NULL, thin = 1) {
#' across all testing data points. The second one, named `allNLL`, is a vector that gives the negative predicted
#' log-likelihood for each testing data point.
#'
-#' @details See further examples and tutorials at .
-#' @examples
-#' \dontrun{
-#'
-#' # Check https://mingdeyu.github.io/dgpsi-R/ for examples
-#' # on how to compute the negative predicted log-likelihood
-#' # using nllik().
-#' }
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @md
#' @export
nllik <- function(object, x, y) {
@@ -1846,7 +1826,7 @@ nllik <- function(object, x, y) {
#'
#' @return A `ggplot` object.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1907,7 +1887,7 @@ trace_plot <- function(object, layer = NULL, node = 1) {
#' @param object an instance of the `dgp` class that is generated by `dgp()`.
#' @param control a list that can supply the following two components to control static pruning of the DGP emulator:
#' * `min_size`, the minimum number of design points required to trigger pruning. Defaults to 10 times of the input dimensions.
-#' * `threshold`, the R^2 value above which a GP node is considered redundant and removable. Defaults to `0.97`.
+#' * `threshold`, the \eqn{R^2} value above which a GP node is considered redundant and removable. Defaults to `0.97`.
#' @param verb a bool indicating if trace information will be printed during the function execution. Defaults to `TRUE`.
#'
#' @return An updated `object` that could be an instance of `gp`, `dgp`, or `bundle` (of GP emulators) class.
@@ -1923,7 +1903,7 @@ trace_plot <- function(object, layer = NULL, node = 1) {
#'
#' in `object` will be removed and not contained in the returned object.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -1993,9 +1973,6 @@ prune <- function(object, control = list(), verb = TRUE) {
stop("To prune, 'object' needs to be trained with a dataset comprising a size at least equal to 'min_size' in 'control'. Use design() to enrich the training set.", call. = FALSE)
}
- if (!"internal_dims" %in% names(object[['specs']])) {
- stop("'object' must be an instance of the 'dgp' class generated by dgp() with 'struc = NULL'.", call. = FALSE)
- } else {
n_layer <- object$constructor_obj$n_layer
if (object$constructor_obj$all_layer[[n_layer]][[1]]$type!='gp') {
n_layer <- n_layer - 1
@@ -2010,7 +1987,6 @@ prune <- function(object, control = list(), verb = TRUE) {
}
}
}
- }
is.finish <- FALSE
cropping_times <- 0
@@ -2271,3 +2247,16 @@ upcase2 <- function(x) {
substr(x, 1, 1) <- toupper(substr(x, 1, 1))
x
}
+
+get_docs_url <- function() {
+ pkg_version <- as.character(utils::packageVersion("dgpsi"))
+
+ is_dev <- grepl("\\.9000$", pkg_version)
+
+ if (is_dev) {
+ "https://mingdeyu.github.io/dgpsi-R/dev/"
+ } else {
+ "https://mingdeyu.github.io/dgpsi-R/"
+ }
+}
+
diff --git a/R/validation.R b/R/validation.R
index a8cd285..d16bc35 100644
--- a/R/validation.R
+++ b/R/validation.R
@@ -25,7 +25,7 @@
#' * `r new_badge("new")` If `object` is an instance of the `lgp` class created by [lgp()] with argument `struc` in data frame form,
#' `x_test` must be a matrix representing the global input, where each row corresponds to a test data point and each column represents a global input dimension.
#' The column indices in `x_test` must align with the indices specified in the `From_Output` column of the `struc` data frame (used in [lgp()]),
-#' corresponding to rows where the `From_Emulator` column is `"Global"`.
+#' corresponding to rows where the `From_Emulator` column is `"Global"`.
#'
#' `x_test` must be provided if `object` is an instance of the `lgp`. `x_test` must also be provided if `y_test` is provided. Defaults to `NULL`, in which case LOO validation is performed.
#' @param y_test the OOS output data corresponding to `x_test`:
@@ -39,7 +39,7 @@
#' in the final layer.
#'
#' `y_test` must be provided if `object` is an instance of the `lgp`. `y_test` must also be provided if `x_test` is provided. Defaults to `NULL`, in which case LOO validation is performed.
-#' @param method `r new_badge("updated")` the prediction approach to use for validation: either the mean-variance approach (`"mean_var"`) or the sampling approach (`"sampling"`). For details see [prediction()].
+#' @param method `r new_badge("updated")` the prediction approach to use for validation: either the mean-variance approach (`"mean_var"`) or the sampling approach (`"sampling"`). For details see [predict()].
#' For DGP emulators with a categorical likelihood (`likelihood = "Categorical"` in [dgp()]), only the sampling approach is supported.
#' By default, the method is set to `"sampling"` for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods and `"mean_var"` otherwise.
#' @param sample_size the number of samples to draw for each given imputation if `method = "sampling"`. Defaults to `50`.
@@ -81,8 +81,7 @@
#' - a vector called `rmse` that contains the root mean/median squared errors of the DGP emulator across different output
#' dimensions.
#' - a vector called `nrmse` that contains the (max-min) normalized root mean/median squared errors of the DGP emulator across different output
-#' dimensions. The max-min normalization
-#' uses the maximum and minimum values of the validation outputs contained in `y_train` (or `y_test`).
+#' dimensions. The max-min normalization uses the maximum and minimum values of the validation outputs contained in `y_train` (or `y_test`).
#' - `r new_badge("new")` an integer called `M` that contains size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.
#' - an integer called `sample_size` that contains the number of samples used for validation if `method = "sampling"`.
#'
@@ -121,7 +120,7 @@
#' be implemented. LOO validation is only applicable to a GP or DGP emulator (i.e., `object` is an instance of the `gp` or `dgp`
#' class). If a linked (D)GP emulator (i.e., `object` is an instance of the `lgp` class) is provided, `x_test` and `y_test` must
#' also be provided for OOS validation.
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -340,7 +339,7 @@ validate.dgp <- function(object, x_test = NULL, y_test = NULL, method = NULL, sa
#check core number
if( !is.null(cores) ) {
cores <- as.integer(cores)
- if ( cores < 1 ) stop("cores must be >= 1.", call. = FALSE)
+ if ( cores < 1 ) stop("'cores' must be >= 1.", call. = FALSE)
}
M <- as.integer(M)
@@ -619,13 +618,13 @@ validate.lgp <- function(object, x_test = NULL, y_test = NULL, method = NULL, sa
if ( "metadata" %in% names(object$specs) ){
if ( !("emulator_obj" %in% names(object)) ){
- stop("'object' is not in activation mode for validation. Please set `mode = 'activate'` in `lgp()` to build the emulator.", call. = FALSE)
+ stop("'object' is not activated for predictions. Please set `activate = TRUE` in `lgp()` to activate the emulator.", call. = FALSE)
}
}
#check core number
if( !is.null(cores) ) {
cores <- as.integer(cores)
- if ( cores < 1 ) stop("cores must be >= 1.", call. = FALSE)
+ if ( cores < 1 ) stop("'cores' must be >= 1.", call. = FALSE)
}
M <- as.integer(M)
diff --git a/R/vigf.R b/R/vigf.R
index ab38503..156edea 100644
--- a/R/vigf.R
+++ b/R/vigf.R
@@ -12,7 +12,7 @@
#' The list must have a length equal to the number of emulators in `object`, with each element being a matrix representing the candidate set for a corresponding
#' emulator in the bundle. Defaults to `NULL`.
#' @param n_start an integer that gives the number of initial design points to be used to determine next design point(s). This argument
-#' is only used when `x_cand` is `NULL`. Defaults to `20`.
+#' is only used when `x_cand` is `NULL`. Defaults to `10`.
#' @param batch_size an integer that gives the number of design points to be chosen.
#' Defaults to `1`.
#' @param M `r new_badge("new")` the size of the conditioning set for the Vecchia approximation in the criterion calculation. This argument is only used if the emulator `object`
@@ -67,7 +67,7 @@
#' @references
#' Mohammadi, H., & Challenor, P. (2022). Sequential adaptive design for emulating costly computer codes. *arXiv:2206.12113*.
#'
-#' @details See further examples and tutorials at .
+#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
@@ -116,7 +116,7 @@ vigf <- function(object, ...){
#' @rdname vigf
#' @method vigf gp
#' @export
-vigf.gp <- function(object, x_cand = NULL, n_start = 20, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, ...) {
+vigf.gp <- function(object, x_cand = NULL, n_start = 10, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, ...) {
if ( is.null(pkg.env$dgpsi) ) {
init_py(verb = F)
if (pkg.env$restart) return(invisible(NULL))
@@ -237,7 +237,7 @@ vigf.gp <- function(object, x_cand = NULL, n_start = 20, batch_size = 1, M = 50,
#' @rdname vigf
#' @method vigf dgp
#' @export
-vigf.dgp <- function(object, x_cand = NULL, n_start = 20, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, aggregate = NULL, ...) {
+vigf.dgp <- function(object, x_cand = NULL, n_start = 10, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, aggregate = NULL, ...) {
if ( is.null(pkg.env$dgpsi) ) {
init_py(verb = F)
if (pkg.env$restart) return(invisible(NULL))
@@ -585,7 +585,7 @@ vigf.dgp <- function(object, x_cand = NULL, n_start = 20, batch_size = 1, M = 50
#' @rdname vigf
#' @method vigf bundle
#' @export
-vigf.bundle <- function(object, x_cand = NULL, n_start = 20, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, aggregate = NULL, ...) {
+vigf.bundle <- function(object, x_cand = NULL, n_start = 10, batch_size = 1, M = 50, workers = 1, limits = NULL, int = FALSE, aggregate = NULL, ...) {
if ( is.null(pkg.env$dgpsi) ) {
init_py(verb = F)
if (pkg.env$restart) return(invisible(NULL))
diff --git a/_pkgdown.yml b/_pkgdown.yml
index e14d34b..9f35b2b 100644
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@@ -48,18 +48,18 @@ articles:
- seq_design
- title: Sequential Design II
- desc: Sequential design for a bundle of DGP emulators with the stopping rule.
+ desc: Sequential design for a bundle of DGP emulators with a stopping rule.
navbar: ~
contents:
- seq_design_2
-- title: Large-scale DGP Emulation
+- title: Large-scale Emulation with the Vecchia Approximation
desc: Large-scale DGP emulation using a Vecchia implementation under the SI.
navbar: ~
contents:
- large_scale_emulation
-- title: DGP Classification using Stochastic Imputation
+- title: DGP Classification using dgpsi
desc: DGP classification of the iris data set.
navbar: ~
contents:
diff --git a/docs/categorical_summary.html b/docs/categorical_summary.html
index e646227..9e31543 100644
--- a/docs/categorical_summary.html
+++ b/docs/categorical_summary.html
@@ -1,8 +1,8 @@
-
+
-
+visNetwork
-
+
-
-
+
+
diff --git a/inst/WORDLIST b/inst/WORDLIST
index a9a274a..ac5f86c 100644
--- a/inst/WORDLIST
+++ b/inst/WORDLIST
@@ -3,6 +3,7 @@ ASA
CMD
Challenor
DGP
+DGPs
ESS
GPs
Gu
@@ -38,6 +39,8 @@ lifecycle
maximin
oos
reproducibility
+scalable
+softmax
suboptimal
th
’s
diff --git a/man/Hetero.Rd b/man/Hetero.Rd
deleted file mode 100644
index 2c1efa4..0000000
--- a/man/Hetero.Rd
+++ /dev/null
@@ -1,39 +0,0 @@
-% Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/likelihood.R
-\name{Hetero}
-\alias{Hetero}
-\title{Initialize a heteroskedastic Gaussian likelihood node}
-\usage{
-Hetero(input_dim = NULL)
-}
-\arguments{
-\item{input_dim}{a vector of length two that contains the indices of two GP nodes in the feeding
-layer whose outputs feed into this likelihood node. When set to \code{NULL},
-all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-one needs to ensure that only two GP nodes are specified in the feeding layer.
-Defaults to \code{NULL}.}
-}
-\value{
-A 'python' object to represent a heteroskedastic Gaussian likelihood node.
-}
-\description{
-\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
-
-This function is deprecated and will be removed in the next release.
-To incorporate a heteroskedastic Gaussian likelihood node into a DGP structure,
-use the \code{likelihood} argument in the \code{dgp()} function instead.
-}
-\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\note{
-The heteroskedastic Gaussian likelihood node can only be linked to two feeding GP nodes.
-}
-\examples{
-\dontrun{
-
-# Check https://mingdeyu.github.io/dgpsi-R/ for examples
-# on how to customize DGP structures using Hetero().
-}
-}
-\keyword{internal}
diff --git a/man/NegBin.Rd b/man/NegBin.Rd
deleted file mode 100644
index 46bea6e..0000000
--- a/man/NegBin.Rd
+++ /dev/null
@@ -1,39 +0,0 @@
-% Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/likelihood.R
-\name{NegBin}
-\alias{NegBin}
-\title{Initialize a negative Binomial likelihood node}
-\usage{
-NegBin(input_dim = NULL)
-}
-\arguments{
-\item{input_dim}{a vector of length two that contains the indices of two GP nodes in the feeding
-layer whose outputs feed into this likelihood node. When set to \code{NULL},
-all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-one needs to ensure that only two GP nodes are specified in the feeding layer.
-Defaults to \code{NULL}.}
-}
-\value{
-A 'python' object to represent a negative Binomial likelihood node.
-}
-\description{
-\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
-
-This function is deprecated and will be removed in the next release.
-To incorporate a negative Binomial likelihood node into a DGP structure,
-use the \code{likelihood} argument in the \code{dgp()} function instead.
-}
-\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\note{
-The negative Binomial likelihood node can only be linked to two feeding GP nodes.
-}
-\examples{
-\dontrun{
-
-# Check https://mingdeyu.github.io/dgpsi-R/ for examples
-# on how to customize DGP structures using NegBin().
-}
-}
-\keyword{internal}
diff --git a/man/Poisson.Rd b/man/Poisson.Rd
deleted file mode 100644
index f9b884b..0000000
--- a/man/Poisson.Rd
+++ /dev/null
@@ -1,39 +0,0 @@
-% Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/likelihood.R
-\name{Poisson}
-\alias{Poisson}
-\title{Initialize a Poisson likelihood node}
-\usage{
-Poisson(input_dim = NULL)
-}
-\arguments{
-\item{input_dim}{a vector of length one that contains the indices of one GP node in the feeding
-layer whose outputs feed into this likelihood node. When set to \code{NULL},
-all outputs from GP nodes in the feeding layer feed into this likelihood node, and in such a case
-one needs to ensure that only one GP node is specified in the feeding layer.
-Defaults to \code{NULL}.}
-}
-\value{
-A 'python' object to represent a Poisson likelihood node.
-}
-\description{
-\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
-
-This function is deprecated and will be removed in the next release.
-To incorporate a Poisson likelihood node into a DGP structure,
-use the \code{likelihood} argument in the \code{dgp()} function instead.
-}
-\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\note{
-The Poisson likelihood node can only be linked to one feeding GP node.
-}
-\examples{
-\dontrun{
-
-# Check https://mingdeyu.github.io/dgpsi-R/ for examples
-# on how to customize DGP structures using Poisson().
-}
-}
-\keyword{internal}
diff --git a/man/alm.Rd b/man/alm.Rd
index cb8ff2d..20cb135 100644
--- a/man/alm.Rd
+++ b/man/alm.Rd
@@ -58,10 +58,10 @@ alm(object, ...)
\item{...}{any arguments (with names different from those of arguments used in \code{\link[=alm]{alm()}}) that are used by \code{aggregate}
can be passed here.}
-\item{x_cand}{a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
-from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} could also
-be a list with length equal to the number of emulators contained in \code{object}. In this case, each slot in \code{x_cand} should be a candidate set matrix
-for each emulator included in the bundle. Defaults to \code{NULL}.}
+\item{x_cand}{a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
+from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} can also be a list.
+The list must have a length equal to the number of emulators in \code{object}, with each element being a matrix representing the candidate set for a corresponding
+emulator in the bundle. Defaults to \code{NULL}.}
\item{n_start}{an integer that gives the number of initial design points to be used to determine next design point(s). This argument
is only used when \code{x_cand} is \code{NULL}. Defaults to \code{20}.}
@@ -94,32 +94,38 @@ of the matrix is equal to:
\item the emulator output dimension if \code{object} is an instance of the \code{dgp} class; or
\item the number of emulators contained in \code{object} if \code{object} is an instance of the \code{bundle} class.
}
-\item the output should be a vector that aggregates scores across outputs or emulators at different design points.
+\item the output should be a vector that gives aggregate scores at different design points.
}
-Set to \code{NULL} to disable the aggregation. Defaults to \code{NULL}.}
+Set to \code{NULL} to disable aggregation. Defaults to \code{NULL}.}
}
\value{
\enumerate{
-\item If \code{x_cand} is not \code{NULL} and:
+\item If \code{x_cand} is not \code{NULL}:
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a vector is returned with length equal to \code{batch_size}, giving the positions (i.e., row numbers)
-of next design points from \code{x_cand}.
-\item \code{object} is an instance of the \code{dgp} class, a vector is returned with length equal to \code{batch_size * D}, giving positions (i.e., row numbers)
-of next design points from \code{x_cand} to be added to the DGP emulator. \code{D} equals to the number of output dimensions of the DGP
-emulator if there is no likelihood layer in the hierarchy. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer,
-\code{D = 2}. If \code{object} is a DGP emulator with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a matrix is returned with row number equal to \code{batch_size} and column number equal to the number of
-emulators in the bundle, giving positions (i.e., row numbers) of next design points from \code{x_cand} to be added to individual emulators.
-}
-\item If \code{x_cand = NULL} and:
+\item When \code{object} is an instance of the \code{gp} class, a vector of length \code{batch_size} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand}.
+\item When \code{object} is an instance of the \code{dgp} class, a vector of length \code{batch_size * D} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand} to be added to the DGP emulator.
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a matrix is returned with row number equal to \code{batch_size}, giving the next design points to be evaluated.
-\item \code{object} is an instance of the \code{dgp} class, a matrix is returned with row number equal to \code{batch_size * D} where \code{D} is the number of output dimensions of the DGP
-emulator if no likelihood layer is included. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}. If \code{object} is a DGP emulator
-with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a list is returned with the length equal to the number of
-emulators in the bundle. Each element in the list is a matrix with row number equal to \code{batch_size}, giving next design points to be added to individual emulators.
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a matrix is returned with \code{batch_size} rows and a column for each emulator in
+the bundle, containing the positions (row numbers) of the next design points from \code{x_cand} for individual emulators.
+}
+\item If \code{x_cand} is \code{NULL}:
+\itemize{
+\item When \code{object} is an instance of the \code{gp} class, a matrix with \code{batch_size} rows is returned, giving the next design points to be evaluated.
+\item When \code{object} is an instance of the \code{dgp} class, a matrix with \code{batch_size * D} rows is returned, where:
+\itemize{
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a list is returned with a length equal to the number of emulators in the bundle. Each
+element of the list is a matrix with \code{batch_size} rows, where each row represents a design point to be added to the corresponding emulator.
}
}
}
@@ -128,11 +134,12 @@ This function searches from a candidate set to locate the next design point(s) t
or a bundle of (D)GP emulators using the Active Learning MacKay (ALM) criterion (see the reference below).
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
-The column order of the first argument of \code{aggregate} must be consistent with the order of emulator output dimensions (if \code{object} is an instance of the
-\code{dgp} class), or the order of emulators placed in \code{object} if \code{object} is an instance of the \code{bundle} class.
+The first column of the matrix supplied to the first argument of \code{aggregate} must correspond to the first output dimension of the DGP emulator
+if \code{object} is an instance of the \code{dgp} class, and so on for subsequent columns and dimensions. If \code{object} is an instance of the \code{bundle} class,
+the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
}
\examples{
\dontrun{
diff --git a/man/combine.Rd b/man/combine.Rd
index 3fea7a6..02ed8a1 100644
--- a/man/combine.Rd
+++ b/man/combine.Rd
@@ -7,17 +7,11 @@
combine(...)
}
\arguments{
-\item{...}{a sequence of lists:
-\enumerate{
-\item For DGP emulations, each list represents a DGP layer and contains GP nodes (produced by \code{\link[=kernel]{kernel()}}), or
-likelihood nodes (produced by \code{\link[=Poisson]{Poisson()}}, \code{\link[=Hetero]{Hetero()}}, or \code{\link[=NegBin]{NegBin()}}.
-\item For linked (D)GP emulations, each list represents a system layer and contains emulators (produced by \code{\link[=gp]{gp()}} or
-\code{\link[=dgp]{dgp()}}) in that layer.
-}}
+\item{...}{a sequence of lists. Each list represents a system layer and contains emulators (produced by \code{\link[=gp]{gp()}} or
+\code{\link[=dgp]{dgp()}}) in that layer.}
}
\value{
-A list defining a DGP structure (for \code{struc} of \code{\link[=dgp]{dgp()}}) or a linked (D)GP structure
-(for \code{struc} for \code{\link[=lgp]{lgp()}}).
+A list defining a linked (D)GP structure to be passed to \code{struc} of \code{\link[=lgp]{lgp()}}.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
@@ -28,12 +22,6 @@ please use the updated \code{\link[=lgp]{lgp()}} function, which provides a simp
approach to building (D)GP emulators.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\examples{
-\dontrun{
-
-# See lgp() for an example.
-}
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\keyword{internal}
diff --git a/man/continue.Rd b/man/continue.Rd
index 6136bcb..a4602bb 100644
--- a/man/continue.Rd
+++ b/man/continue.Rd
@@ -2,7 +2,7 @@
% Please edit documentation in R/dgp.R
\name{continue}
\alias{continue}
-\title{Continue the training of a DGP emulator}
+\title{Continue training a DGP emulator}
\usage{
continue(
object,
@@ -17,7 +17,7 @@ continue(
\arguments{
\item{object}{an instance of the \code{dgp} class.}
-\item{N}{additional number of iterations for the DGP emulator training. If set to \code{NULL}, the number of iterations is set to \code{500} if the DGP emulator
+\item{N}{additional number of iterations to train the DGP emulator. If set to \code{NULL}, the number of iterations is set to \code{500} if the DGP emulator
was constructed without the Vecchia approximation, and is set to \code{200} if Vecchia approximation was used. Defaults to \code{NULL}.}
\item{cores}{the number of processes to be used to optimize GP components (in the same layer) at each M-step of the training. If set to \code{NULL},
@@ -25,24 +25,18 @@ the number of processes is set to \verb{(max physical cores available - 1)} if t
Otherwise, the number of processes is set to \verb{max physical cores available \%/\% 2}. Only use multiple processes when there is a large number of
GP components in different layers and optimization of GP components is computationally expensive. Defaults to \code{1}.}
-\item{ess_burn}{number of burnin steps for the ESS-within-Gibbs
+\item{ess_burn}{number of burnin steps for ESS-within-Gibbs
at each I-step of the training. Defaults to \code{10}.}
-\item{verb}{a bool indicating if the progress bar will be printed during the training:
-\enumerate{
-\item \code{FALSE}: the training progress bar will not be displayed.
-\item \code{TRUE}: the training progress bar will be displayed.
-}
-
-Defaults to \code{TRUE}.}
+\item{verb}{a bool indicating if a progress bar will be printed during training. Defaults to \code{TRUE}.}
\item{burnin}{the number of training iterations to be discarded for
point estimates calculation. Must be smaller than the overall training iterations
so-far implemented. If this is not specified, only the last 25\% of iterations
are used. This overrides the value of \code{burnin} set in \code{\link[=dgp]{dgp()}}. Defaults to \code{NULL}.}
-\item{B}{the number of imputations to produce the predictions. Increase the value to account for
-more imputation uncertainties. This overrides the value of \code{B} set in \code{\link[=dgp]{dgp()}} if \code{B} is not
+\item{B}{the number of imputations to produce predictions. Increase the value to account for
+more imputation uncertainty. This overrides the value of \code{B} set in \code{\link[=dgp]{dgp()}} if \code{B} is not
\code{NULL}. Defaults to \code{NULL}.}
}
\value{
@@ -52,7 +46,7 @@ An updated \code{object}.
This function implements additional training iterations for a DGP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
diff --git a/man/deserialize.Rd b/man/deserialize.Rd
index 62c0cce..4a08332 100644
--- a/man/deserialize.Rd
+++ b/man/deserialize.Rd
@@ -16,7 +16,7 @@ The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bun
This function restores the serialized emulator created by \code{\link[=serialize]{serialize()}}.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/design.Rd b/man/design.Rd
index 36ac302..e3238de 100644
--- a/man/design.Rd
+++ b/man/design.Rd
@@ -125,7 +125,7 @@ design(
\item the S3 class \code{bundle}.
}}
-\item{N}{the number of steps for the sequential design.}
+\item{N}{the number of iterations for the sequential design.}
\item{x_cand}{a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
from which the next design points are determined. Defaults to \code{NULL}.}
@@ -146,134 +146,148 @@ dimensions, and its first and second columns correspond to the minimum and maxim
\code{limits = NULL} if \code{x_cand} is supplied. This argument is only used when \code{x_cand} is not supplied, i.e., \code{x_cand = NULL}. Defaults to \code{NULL}. If you provide
a custom \code{method} function with an argument called \code{limits}, the value of \code{limits} will be passed to your function.}
-\item{f}{an R function that represents the simulator. \code{f} needs to be specified with the following basic rules:
+\item{f}{an R function representing the simulator. \code{f} must adhere to the following rules:
\itemize{
-\item the first argument of the function should be a matrix with rows being different design points and columns being input dimensions.
-\item the output of the function can either
+\item \strong{First argument}: a matrix where rows correspond to different design points, and columns represent input dimensions.
+\item \strong{Function output}:
\itemize{
-\item a matrix with rows being different outputs (corresponding to the input design points) and columns being output dimensions. If there is
-only one output dimension, the matrix still needs to be returned with a single column.
-\item a list with the first element being the output matrix described above and, optionally, additional named elements which will update values
-of any arguments with the same names passed via \code{...}. The list output can be useful if some additional arguments of \code{f} and \code{aggregate}
-need to be updated after each step of the sequential design.
+\item a matrix where rows correspond to different outputs (matching the input design points) and columns represent output dimensions.
+If there is only one output dimension, the function should return a matrix with a single column.
+\item alternatively, a list where:
+\itemize{
+\item the first element is the output matrix as described above.
+\item additional named elements can optionally update values of arguments with matching names passed via \code{...}. This list output is
+useful if additional arguments to \code{f}, \code{method}, or \code{eval} need to be updated after each sequential design iteration.
+}
}
}
-See \emph{Note} section below for further information. This argument is used when \code{y_cand = NULL}. Defaults to \code{NULL}.}
+See the \emph{Note} section below for additional details. This argument is required and must be supplied when \code{y_cand = NULL}. Defaults to \code{NULL}.}
\item{reps}{an integer that gives the number of repetitions of the located design points to be created and used for evaluations of \code{f}. Set the
-argument to an integer greater than \code{1} if \code{f} is a stochastic function that can generate different responses given a same input and the
+argument to an integer greater than \code{1} only if \code{f} is a stochastic function that can generate different responses given for the same input and the
supplied emulator \code{object} can deal with stochastic responses, e.g., a (D)GP emulator with \code{nugget_est = TRUE} or a DGP emulator with a
likelihood layer. The argument is only used when \code{f} is supplied. Defaults to \code{1}.}
-\item{freq}{a vector of two integers with the first element giving the frequency (in number of steps) to re-fit the
-emulator, and the second element giving the frequency to implement the emulator validation (for RMSE). Defaults to \code{c(1, 1)}.}
+\item{freq}{a vector of two integers with the first element indicating the number of iterations taken between re-estimating
+the emulator hyperparameters, and the second element defining the number of iterations to take between re-calculation of evaluating metrics
+on the validation set (see \code{x_test} below) via the \code{eval} function. Defaults to \code{c(1, 1)}.}
\item{x_test}{a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing
-input data to evaluate the emulator after each step of the sequential design. Set to \code{NULL} for the LOO-based emulator validation.
+input data to evaluate the emulator after each \code{freq[2]} iterations of the sequential design. Set to \code{NULL} for LOO-based emulator validation.
Defaults to \code{NULL}. This argument is only used if \code{eval = NULL}.}
-\item{y_test}{the testing output data that correspond to \code{x_test} for the emulator validation after each step of the sequential design:
+\item{y_test}{the testing output data corresponding to \code{x_test} for emulator validation after each \code{freq[2]} iterations of the sequential design:
\itemize{
-\item if \code{object} is an instance of the \code{gp} class, \code{y_test} is a matrix with only one column and each row being an testing output data point.
-\item if \code{object} is an instance of the \code{dgp} class, \code{y_test} is a matrix with its rows being testing output data points and columns being
+\item if \code{object} is an instance of the \code{gp} class, \code{y_test} is a matrix with only one column and each row contains a testing output data point from the corresponding row of \code{x_test}.
+\item if \code{object} is an instance of the \code{dgp} class, \code{y_test} is a matrix with its rows containing testing output data points corresponding to the same rows of \code{x_test} and columns representing the
output dimensions.
+\item if \code{object} is an instance of the \code{bundle} class, \code{y_test} is a matrix with each row representing the outputs for the corresponding row of \code{x_test} and each column representing the output of the different emulators in the bundle.
}
-Set to \code{NULL} for the LOO-based emulator validation. Defaults to \code{NULL}. This argument is only used if \code{eval = NULL}.}
+Set to \code{NULL} for LOO-based emulator validation. Defaults to \code{NULL}. This argument is only used if \code{eval = NULL}.}
+
+\item{reset}{A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
+The re-fitting occurs based on the frequency specified by \code{freq[1]}. This option is useful when hyperparameters are suspected to have converged to a local optimum affecting validation performance.
+\itemize{
+\item If a single bool is provided, it applies to every iteration of the sequential design.
+\item If a vector is provided, its length must equal \code{N} (even if the re-fit frequency specified in \code{freq[1]} is not 1) and it will apply to the corresponding iterations of the sequential design.
+}
-\item{reset}{a bool or a vector of bools indicating whether to reset hyperparameters of the emulator to their initial values when it was initially
-constructed after the input-output update and before the re-fit. If a bool is given, it will be applied to
-every step of the sequential design. If a vector is provided, its length should be equal to \code{N} and will be applied to individual
-steps of the sequential design. Defaults to \code{FALSE}.}
+Defaults to \code{FALSE}.}
-\item{target}{a numeric or a vector that gives the target RMSEs at which the sequential design is terminated. Defaults to \code{NULL}, in which
-case the sequential design stops after \code{N} steps. See \emph{Note} section below for further information about \code{target}.}
+\item{target}{a number or vector specifying the target evaluation metric value(s) at which the sequential design should terminate.
+Defaults to \code{NULL}, in which case the sequential design stops after \code{N} steps. See the \emph{Note} section below for further details about \code{target}.}
-\item{method}{an R function that give indices of designs points in a candidate set. The function must satisfy the following basic rules:
+\item{method}{an R function that determines the next design points to be evaluated by \code{f}. The function must adhere to the following rules:
\itemize{
-\item the first argument is an emulator object that can be either an instance of
+\item \strong{First argument}: an emulator object, which can be one of the following:
\itemize{
-\item the \code{gp} class (produced by \code{\link[=gp]{gp()}});
-\item the \code{dgp} class (produced by \code{\link[=dgp]{dgp()}});
-\item the \code{bundle} class (produced by \code{\link[=pack]{pack()}}).
+\item an instance of the \code{gp} class (produced by \code{\link[=gp]{gp()}});
+\item an instance of the \code{dgp} class (produced by \code{\link[=dgp]{dgp()}});
+\item an instance of the \code{bundle} class (produced by \code{\link[=pack]{pack()}}).
}
-\item if \code{x_cand} is not \code{NULL}, the second argument is a matrix with rows representing a set of different design points.
-\item if \code{x_cand} is not \code{NULL}, the output of the function
+\item \strong{Second argument} (if \code{x_cand} is not \code{NULL}): a \emph{candidate matrix} representing a set of potential design points from which the \code{method} function selects the next points.
+\item \strong{Function output}:
+\itemize{
+\item If \code{x_cand} is not \code{NULL}:
\itemize{
-\item is a vector that gives the row indices of chosen design points from the matrix supplied to the second argument, if the first argument is an instance of the \code{gp} or \code{dgp} class;
-\item is a matrix that row indices of chosen design points from the matrix supplied to the second argument, if the first argument is an instance of the \code{bundle} class. Each column of the matrix gives the indices of the design
-points to be added to individual emulators in the bundle.
+\item for \code{gp} or \code{dgp} objects, the output must be a vector of row indices corresponding to the selected design points from the \emph{candidate matrix} (the second argument).
+\item for \code{bundle} objects, the output must be a matrix containing the row indices of the selected design points from the \emph{candidate matrix}. Each column corresponds to
+the indices for an individual emulator in the bundle.
}
-\item if \code{x_cand} is \code{NULL}, the output of the function
+\item If \code{x_cand} is \code{NULL}:
\itemize{
-\item is a matrix that gives with each row representing a new design point to be added, if the first argument is an instance of the \code{gp} or \code{dgp} class;
-\item is a list with the length equal to the number of emulators in the bundle, if the first argument is an instance of the \code{bundle} class. Each element in the list is a matrix with same number of row. The rows of the matrix
-represent next design points to be added to the corresponding emulator.
+\item for \code{gp} or \code{dgp} objects, the output must be a matrix where each row represents a new design point to be added.
+\item for \code{bundle} objects, the output must be a list with a length equal to the number of emulators in the bundle. Each element in the list is a matrix where rows
+represent the new design points for the corresponding emulator.
+}
}
}
-See \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}} for examples on customizing \code{method}. Defaults to \code{\link[=vigf]{vigf()}}.}
+See \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}} for examples of built-in \code{method} functions. Defaults to \code{\link[=vigf]{vigf()}}.}
-\item{batch_size}{an integer specifying the number of design points to select in a single iteration. Defaults to \code{1}. This argument is used by
-the built-in \code{method} functions \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}}. If you provide a custom \code{method} function with an argument called \code{batch_size}, the value of \code{batch_size}
-will be passed to your function.}
+\item{batch_size}{an integer specifying the number of design points to select in a single iteration. Defaults to \code{1}.
+This argument is used by the built-in \code{method} functions \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}}.
+If you provide a custom \code{method} function with an argument named \code{batch_size}, the value of \code{batch_size} will be passed to your function.}
-\item{eval}{an R function that calculates the customized evaluating metric of the emulator. The function must satisfy the following basic rules:
+\item{eval}{an R function that computes a customized metric for evaluating emulator performance. The function must adhere to the following rules:
\itemize{
-\item the first argument is an emulator object that can be either an instance of
+\item \strong{First argument}: an emulator object, which can be one of the following:
\itemize{
-\item the \code{gp} class (produced by \code{\link[=gp]{gp()}});
-\item the \code{dgp} class (produced by \code{\link[=dgp]{dgp()}});
-\item the \code{bundle} class (produced by \code{\link[=pack]{pack()}}).
+\item an instance of the \code{gp} class (produced by \code{\link[=gp]{gp()}});
+\item an instance of the \code{dgp} class (produced by \code{\link[=dgp]{dgp()}});
+\item an instance of the \code{bundle} class (produced by \code{\link[=pack]{pack()}}).
}
-\item the output of the function can be
+\item \strong{Function output}:
\itemize{
-\item a single metric value, if the first argument is an instance of the \code{gp} class;
-\item a single metric value or a vector of metric values with the length equal to the number of output dimensions, if the first argument is an
-instance of the \code{dgp} class;
-\item a single metric value metric or a vector of metric values with the length equal to the number of emulators in the bundle, if the first
-argument is an instance of the \code{bundle} class.
+\item for \code{gp} objects, the output must be a single metric value.
+\item for \code{dgp} objects, the output can be a single metric value or a vector of metric values with a length equal to the number of output dimensions.
+\item for \code{bundle} objects, the output can be a single metric value or a vector of metric values with a length equal to the number of emulators in the bundle.
}
}
-If no customized function is provided, the built-in evaluation metric, RMSE, will be calculated. Defaults to \code{NULL}. See \emph{Note} section below for further information.}
+If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used.
+Defaults to \code{NULL}. See the \emph{Note} section below for additional details.}
-\item{verb}{a bool indicating if the trace information will be printed during the sequential design.
+\item{verb}{a bool indicating if trace information will be printed during the sequential design.
Defaults to \code{TRUE}.}
\item{autosave}{a list that contains configuration settings for the automatic saving of the emulator:
\itemize{
-\item \code{switch}: a bool indicating whether to enable the automatic saving of the emulator during the sequential design. When set to \code{TRUE},
+\item \code{switch}: a bool indicating whether to enable automatic saving of the emulator during sequential design. When set to \code{TRUE},
the emulator in the final iteration is always saved. Defaults to \code{FALSE}.
\item \code{directory}: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory
of \code{directory} named 'emulator-\code{id}'. Defaults to './check_points'.
\item \code{fname}: a string representing the base name for the saved emulator files. Defaults to 'check_point'.
-\item \code{save_freq}: an integer indicating the frequency of automatic savings, measured in the number of iterations. Defaults to \code{5}.
-\item \code{overwrite}: a bool value controlling the file saving behavior. When set to \code{TRUE}, each new automatic saving overwrites the previous one,
-keeping only the latest version. If \code{FALSE}, each automatic saving creates a new file, preserving all previous versions. Defaults to \code{FALSE}.
+\item \code{save_freq}: an integer indicating the frequency of automatic saves, measured in the number of iterations. Defaults to \code{5}.
+\item \code{overwrite}: a bool value controlling the file saving behavior. When set to \code{TRUE}, each new automatic save overwrites the previous one,
+keeping only the latest version. If \code{FALSE}, each automatic save creates a new file, preserving all previous versions. Defaults to \code{FALSE}.
}}
-\item{new_wave}{a bool indicating if the current execution of \code{\link[=design]{design()}} will create a new wave of sequential designs or add the sequential designs to
-the last existing wave. This argument is only used if there are waves existing in the emulator. By creating new waves, one can better visualize the performance
-of the sequential designs in different executions of \code{\link[=design]{design()}} in \code{\link[=draw]{draw()}} and can specify a different evaluation frequency in \code{freq}. However, it can be
-beneficiary to turn this option off to restrict a large number of waves to be visualized in \code{\link[=draw]{draw()}} that could run out of colors. Defaults to \code{TRUE}.}
+\item{new_wave}{a bool indicating whether the current call to \code{\link[=design]{design()}} will create a new wave of sequential designs or add the next sequence of designs to the most recent wave.
+This argument is relevant only if waves already exist in the emulator. Creating new waves can improve the visualization of sequential design performance across different calls
+to \code{\link[=design]{design()}} via \code{\link[=draw]{draw()}}, and allows for specifying a different evaluation frequency in \code{freq}. However, disabling this option can help limit the number of waves visualized
+in \code{\link[=draw]{draw()}} to avoid issues such as running out of distinct colors for large numbers of waves. Defaults to \code{TRUE}.}
\item{M_val}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer that gives the size of the conditioning set for the Vecchia approximation in emulator validations. This argument is only used if the emulator \code{object}
was constructed under the Vecchia approximation. Defaults to \code{50}.}
-\item{cores}{an integer that gives the number of processes to be used for emulator validations. If set to \code{NULL}, the number of processes is set to
+\item{cores}{an integer that gives the number of processes to be used for emulator validation. If set to \code{NULL}, the number of processes is set to
\verb{max physical cores available \%/\% 2}. Defaults to \code{1}. This argument is only used if \code{eval = NULL}.}
\item{...}{Any arguments with names that differ from those used in \code{\link[=design]{design()}} but are required by \code{f}, \code{method}, or \code{eval} can be passed here.
\code{\link[=design]{design()}} will forward relevant arguments to \code{f}, \code{method}, and \code{eval} based on the names of the additional arguments provided.}
-\item{train_N}{the number of training iterations to be used to re-fit the DGP emulator at each step of the sequential design:
+\item{train_N}{the number of training iterations to be used for re-fitting the DGP emulator at each step of the sequential design:
+\itemize{
+\item If \code{train_N} is an integer, the DGP emulator will be re-fitted at each step (based on the re-fit frequency specified in \code{freq[1]}) using \code{train_N} iterations.
+\item If \code{train_N} is a vector, its length must be \code{N}, even if the re-fit frequency specified in \code{freq[1]} is not 1.
+\item If \code{train_N} is \code{NULL}, the DGP emulator will be re-fitted at each step (based on the re-fit frequency specified in \code{freq[1]}) using:
\itemize{
-\item If \code{train_N} is an integer, then at each step the DGP emulator will be re-fitted (based on the frequency of re-fit specified in \code{freq}) with \code{train_N} iterations.
-\item If \code{train_N} is a vector, then its size must be \code{N} even the re-fit frequency specified in \code{freq} is not one.
-\item If \code{train_N} is \code{NULL}, then at each step the DGP emulator will be re-fitted (based on the frequency of re-fit specified in \code{freq}) with \code{100} iterations
-if the DGP emulator was constructed without the Vecchia approximation, and with \code{50} iterations if Vecchia approximation was used.
+\item \code{100} iterations if the DGP emulator was constructed without the Vecchia approximation, or
+\item \code{50} iterations if the Vecchia approximation was used.
+}
}
Defaults to \code{NULL}.}
@@ -286,13 +300,13 @@ is computationally expensive. Defaults to \code{1}.}
\item{pruning}{a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
design points exceeds \code{min_size} in \code{control}. The argument is only applicable to DGP emulators (i.e., \code{object} is an instance of \code{dgp} class)
-produced by \code{dgp()} with \code{struc = NULL}. Defaults to \code{TRUE}.}
+produced by \code{dgp()}. Defaults to \code{TRUE}.}
\item{control}{a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
\itemize{
-\item \code{min_size}, the minimum number of design points required to trigger the dynamic pruning. Defaults to 10 times of the input dimensions.
-\item \code{threshold}, the R2 value above which a GP node is considered redundant. Defaults to \code{0.97}.
-\item \code{nexceed}, the minimum number of consecutive iterations that the R2 value of a GP node must exceed \code{threshold} to trigger the removal of that node from
+\item \code{min_size}, the minimum number of design points required to trigger dynamic pruning. Defaults to 10 times the number of input dimensions.
+\item \code{threshold}, the \eqn{R^2} value above which a GP node is considered redundant. Defaults to \code{0.97}.
+\item \code{nexceed}, the minimum number of consecutive iterations that the \eqn{R^2} value of a GP node must exceed \code{threshold} to trigger the removal of that node from
the DGP structure. Defaults to \code{3}.
}
@@ -301,17 +315,19 @@ The argument is only used when \code{pruning = TRUE}.}
\value{
An updated \code{object} is returned with a slot called \code{design} that contains:
\itemize{
-\item \emph{S} slots, named \verb{wave1, wave2,..., waveS}, that contain information of \emph{S} waves of sequential designs that have been applied to the emulator.
+\item \emph{S} slots, named \verb{wave1, wave2,..., waveS}, that contain information of \emph{S} waves of sequential design that have been applied to the emulator.
Each slot contains the following elements:
\itemize{
-\item \code{N}, an integer that gives the numbers of steps implemented in the corresponding wave;
-\item \code{rmse}, a matrix that gives the values of evaluation metrics of emulators constructed during the corresponding wave, if \code{eval = NULL}. Each
-row of the matrix represents an iteration. If \code{object} is an instance of \code{gp} class, the matrix has a single columns of RMSEs. If \code{object} is
-an instance of \code{dgp} class, the elements in a row give the RMSEs corresponding to different output dimensions. If \code{object} is an instance of
-\code{dgp} class with categorical likelihood, the matrix has a single column of log-losses. If \code{object} is an instance of \code{bundle} class, the elements in
-a row give either RMSEs or log-losses of emulators contained in the bundle.
-\item \code{metric}, a matrix that gives the customized evaluating metric values of emulators constructed during the corresponding wave,
-if a customized function is supplied to \code{eval};
+\item \code{N}, an integer that gives the numbers of iterations implemented in the corresponding wave;
+\item \code{rmse}, a matrix providing the evaluation metric values for emulators constructed during the corresponding wave, when \code{eval = NULL}.
+Each row of the matrix represents an iteration.
+\itemize{
+\item for an \code{object} of class \code{gp}, the matrix contains a single column of RMSE values.
+\item for an \code{object} of class \code{dgp} without a categorical likelihood, each row contains mean/median squared errors corresponding to different output dimensions.
+\item for an \code{object} of class \code{dgp} with a categorical likelihood, the matrix contains a single column of log-loss values.
+\item for an \code{object} of class \code{bundle}, each row contains either mean/median squared errors or log-loss values for the emulators in the bundle.
+}
+\item \code{metric}: a matrix providing the values of custom evaluation metrics, as computed by the user-supplied \code{eval} function, for emulators constructed during the corresponding wave.
\item \code{freq}, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave.
\item \code{enrichment}, a vector of size \code{N} that gives the number of new design points added after each step of the sequential design (if \code{object} is
an instance of the \code{gp} or \code{dgp} class), or a matrix that gives the number of new design points added to emulators in a bundle after each step of
@@ -320,11 +336,18 @@ the sequential design (if \code{object} is an instance of the \code{bundle} clas
If \code{target} is not \code{NULL}, the following additional elements are also included:
\itemize{
-\item \code{target}, the target evaluating metric computed by the \code{eval} or built-in function to stop the sequential design.
-\item \code{reached}, a bool (if \code{object} is an instance of the \code{gp} or \code{dgp} class) or a vector of bools (if \code{object} is an instance of the \code{bundle}
-class) that indicate if the target RMSEs are reached at the end of the sequential design.
+\item \code{target}: the target evaluating metric computed by the \code{eval} or built-in function to stop the sequential design.
+\item \code{reached}: indicates whether the \code{target} was reached at the end of the sequential design:
+\itemize{
+\item a bool if \code{object} is an instance of the \code{gp} or \code{dgp} class.
+\item a vector of bools if \code{object} is an instance of the \code{bundle} class, with its length determined as follows:
+\itemize{
+\item equal to the number of emulators in the bundle when \code{eval = NULL}.
+\item equal to the length of the output from \code{eval} when a custom \code{eval} function is provided.
+}
+}
}
-\item a slot called \code{type} that gives the type of validations:
+\item a slot called \code{type} that gives the type of validation:
\itemize{
\item either LOO ('loo') or OOS ('oos') if \code{eval = NULL}. See \code{\link[=validate]{validate()}} for more information about LOO and OOS.
\item 'customized' if a customized R function is provided to \code{eval}.
@@ -338,18 +361,20 @@ avoid re-visiting the same locations in later runs of \code{design()}.
See \emph{Note} section below for further information.
}
\description{
-This function implements the sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
+This function implements sequential design and active learning for a (D)GP emulator or
+a bundle of (D)GP emulators, supporting an array of popular methods as well as user-specified approaches.
+It can also be used as a wrapper for Bayesian optimization methods.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
-\item The validation of an emulator is forced after the final step of a sequential design even \code{N} is not multiples of the second element in \code{freq}.
+\item Validation of an emulator is forced after the final step of a sequential design even if \code{N} is not a multiple of the second element in \code{freq}.
\item Any \code{loo} or \code{oos} slot that already exists in \code{object} will be cleaned, and a new slot called \code{loo} or \code{oos} will be created in the returned object
depending on whether \code{x_test} and \code{y_test} are provided. The new slot gives the validation information of the emulator constructed in the final step of
the sequential design. See \code{\link[=validate]{validate()}} for more information about the slots \code{loo} and \code{oos}.
-\item If \code{object} has previously been used by \code{\link[=design]{design()}} for sequential designs, the information of the current wave of the sequential design will replace
+\item If \code{object} has previously been used by \code{\link[=design]{design()}} for sequential design, the information of the current wave of the sequential design will replace
those of old waves and be contained in the returned object, unless
\itemize{
\item the validation type (LOO or OOS depending on whether \code{x_test} and \code{y_test} are supplied or not) of the current wave of the sequential design is the
@@ -360,15 +385,15 @@ functions are consistent among different waves. Otherwise, the trace plot of RMS
different waves.
}
-In above two cases, the information of the current wave of the sequential design will be added to the \code{design} slot of the returned object under the name \code{waveS}.
+For the above two cases, the information of the current wave of the sequential design will be added to the \code{design} slot of the returned object under the name \code{waveS}.
\item If \code{object} is an instance of the \code{gp} class and \code{eval = NULL}, the matrix in the \code{rmse} slot is single-columned. If \code{object} is an instance of
the \code{dgp} or \code{bundle} class and \code{eval = NULL}, the matrix in the \code{rmse} slot can have multiple columns that correspond to different output dimensions
or different emulators in the bundle.
\item If \code{object} is an instance of the \code{gp} class and \code{eval = NULL}, \code{target} needs to be a single value giving the RMSE threshold. If \code{object} is an instance
-of the \code{dgp} or \code{bundle} class and \code{eval = NULL}, \code{target} can be a vector of values that gives the RMSE thresholds for different output dimensions or
-different emulators. If a single value is provided, it will be used as the RMSE threshold for all output dimensions (if \code{object} is an instance of the \code{dgp}) or all emulators
-(if \code{object} is an instance of the \code{bundle}). If a customized function is supplied to \code{eval}, the user needs to ensure that the length of \code{target} is equal
-to that of the output from \code{eval} if \code{target} is given as a vector.
+of the \code{dgp} or \code{bundle} class and \code{eval = NULL}, \code{target} can be a vector of values that gives the thresholds of evaluating metrics for different output dimensions or
+different emulators. If a single value is provided, it will be used as the threshold for all output dimensions (if \code{object} is an instance of the \code{dgp}) or all emulators
+(if \code{object} is an instance of the \code{bundle}). If a customized function is supplied to \code{eval} and \code{target} is given as a vector, the user needs to ensure that the length
+of \code{target} is equal to that of the output from \code{eval}.
\item When defining \code{f}, it is important to ensure that:
\itemize{
\item the column order of the first argument of \code{f} is consistent with the training input used for the emulator;
@@ -376,8 +401,8 @@ to that of the output from \code{eval} if \code{target} is given as a vector.
or the order of emulators placed in \code{object} (if \code{object} is an instance of the \code{bundle} class).
}
\item The output matrix produced by \code{f} may include \code{NA}s. This is especially beneficial as it allows the sequential design process to continue without interruption,
-even if errors or \code{NA} outputs are encountered from \code{f} at certain input locations identified by the sequential designs. Users should ensure to handle any errors
-within \code{f} by appropriately returning \code{NA}s.
+even if errors or \code{NA} outputs are encountered from \code{f} at certain input locations identified by the sequential design. Users should ensure that any errors
+within \code{f} are handled by appropriately returning \code{NA}s.
\item When defining \code{eval}, the output metric needs to be positive if \code{\link[=draw]{draw()}} is used with \code{log = T}. And one needs to ensure that a lower metric value indicates
a better emulation performance if \code{target} is set.
}
diff --git a/man/dgp.Rd b/man/dgp.Rd
index 79c4f84..c886949 100644
--- a/man/dgp.Rd
+++ b/man/dgp.Rd
@@ -7,7 +7,6 @@
dgp(
X,
Y,
- struc = NULL,
depth = 2,
node = ncol(X),
name = "sexp",
@@ -39,26 +38,16 @@ dgp(
)
}
\arguments{
-\item{X}{a matrix where each row is an input training data point and each column is an input dimension.}
+\item{X}{a matrix where each row is an input training data point and each column represents an input dimension.}
-\item{Y}{a matrix containing observed training output data. The matrix has its rows being output data points and columns being
-output dimensions. When \code{likelihood} (see below) is not \code{NULL}, \code{Y} must be a matrix with only one column.}
-
-\item{struc}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} a list that specifies a user-defined DGP structure. It should contain \emph{L} (the number of DGP layers) sub-lists,
-each of which represents a layer and contains a number of GP nodes (defined by \code{\link[=kernel]{kernel()}}) in the corresponding layer.
-The final layer of the DGP structure (i.e., the final sub-list in \code{struc}) can be a likelihood
-layer that contains a likelihood function (e.g., \code{\link[=Poisson]{Poisson()}}). When \code{struc = NULL},
-the DGP structure is automatically generated and can be checked by applying \code{\link[=summary]{summary()}} to the output from \code{\link[=dgp]{dgp()}} with \code{training = FALSE}.
-If this argument is used (i.e., user provides a customized DGP structure), arguments \code{depth}, \code{node}, \code{name}, \code{lengthscale}, \code{bounds}, \code{prior},
-\code{share}, \code{nugget_est}, \code{nugget}, \code{scale_est}, \code{scale}, \code{connect}, \code{likelihood}, and \code{internal_input_idx} will NOT be used. Defaults to \code{NULL}.
-
-\strong{The argument will be removed in the next release. To customize DGP specifications, please adjust the other arguments in the \code{\link[=dgp]{dgp()}} function.}}
+\item{Y}{a matrix containing observed training output data. The matrix has its rows being output data points and columns representing
+output dimensions. When \code{likelihood} (see below) is not \code{NULL}, \code{Y} must be a matrix with a single column.}
\item{depth}{number of layers (including the likelihood layer) for a DGP structure. \code{depth} must be at least \code{2}.
-Defaults to \code{2}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{2}.}
\item{node}{number of GP nodes in each layer (except for the final layer or the layer feeding the likelihood node) of the DGP. Defaults to
-\code{ncol(X)}. This argument is only used when \code{struc = NULL}.}
+\code{ncol(X)}.}
\item{name}{a character or a vector of characters that indicates the kernel functions (either \code{"sexp"} for squared exponential kernel or
\code{"matern2.5"} for Matérn-2.5 kernel) used in the DGP emulator:
@@ -67,7 +56,7 @@ Defaults to \code{2}. This argument is only used when \code{struc = NULL}.}
\item if a vector of characters is supplied, each character of the vector specifies the kernel function that will be applied to all GP nodes in the corresponding layer.
}
-Defaults to \code{"sexp"}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{"sexp"}.}
\item{lengthscale}{initial lengthscales for GP nodes in the DGP emulator. It can be a single numeric value or a vector:
\enumerate{
@@ -76,7 +65,7 @@ Defaults to \code{"sexp"}. This argument is only used when \code{struc = NULL}.}
The vector should have a length of \code{depth} if \code{likelihood = NULL} or a length of \code{depth - 1} if \code{likelihood} is not \code{NULL}.
}
-Defaults to a numeric value of \code{1.0}. This argument is only used when \code{struc = NULL}.}
+Defaults to a numeric value of \code{1.0}.}
\item{bounds}{the lower and upper bounds of lengthscales in GP nodes. It can be a vector or a matrix:
\enumerate{
@@ -86,18 +75,18 @@ lengthscales for all GP nodes in the DGP hierarchy.
The matrix should have its row number equal to \code{depth} if \code{likelihood = NULL} or to \code{depth - 1} if \code{likelihood} is not \code{NULL}.
}
-Defaults to \code{NULL} where no bounds are specified for the lengthscales. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{NULL} where no bounds are specified for the lengthscales.}
-\item{prior}{prior to be used for Maximum a Posterior for lengthscales and nuggets of all GP nodes in the DGP hierarchy:
+\item{prior}{prior to be used for MAP estimation of lengthscales and nuggets of all GP nodes in the DGP hierarchy:
\itemize{
\item gamma prior (\code{"ga"}),
\item inverse gamma prior (\code{"inv_ga"}), or
\item jointly robust prior (\code{"ref"}).
}
-Defaults to \code{"ga"}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{"ga"}.}
-\item{share}{a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to \code{TRUE}. This argument is only used when \code{struc = NULL}.}
+\item{share}{a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to \code{TRUE}.}
\item{nugget_est}{a bool or a bool vector that indicates if the nuggets of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of \code{ncol(Y)}) is provided, each
@@ -107,7 +96,7 @@ bool element in the vector will be applied to the corresponding GP node (if any)
\item \code{TRUE}: the nugget of the corresponding GP in the final layer will be estimated with the initial value given by the correspondence in \code{nugget} (see below).
}
-Defaults to \code{FALSE}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{FALSE}.}
\item{nugget}{the initial nugget value(s) of GP nodes (if any) in each layer:
\enumerate{
@@ -116,13 +105,13 @@ Defaults to \code{FALSE}. This argument is only used when \code{struc = NULL}.}
The vector should have a length of \code{depth} if \code{likelihood = NULL} or a length of \code{depth - 1} if \code{likelihood} is not \code{NULL}.
}
-Set \code{nugget} to a small value and the bools in \code{nugget_est} to \code{FASLE} for deterministic emulations where the emulator
-interpolates the training data points. Set \code{nugget} to a reasonable larger value and the bools in \code{nugget_est} to \code{TRUE} for stochastic emulations where
+Set \code{nugget} to a small value and the bools in \code{nugget_est} to \code{FALSE} for deterministic emulation, where the emulator
+interpolates the training data points. Set \code{nugget} to a larger value and the bools in \code{nugget_est} to \code{TRUE} for stochastic emulation where
the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to \code{1e-6} if \code{nugget_est = FALSE} and
\code{0.01} if \code{nugget_est = TRUE}. If \code{likelihood} is not \code{NULL} and \code{nugget_est = FALSE}, the nuggets of GPs that feed into the likelihood layer default to
-\code{1e-4}. This argument is only used when \code{struc = NULL}.}
+\code{1e-4}.}
-\item{scale_est}{a bool or a bool vector that indicates if variance of GP nodes (if any) in the final layer are to be estimated. If a single bool is
+\item{scale_est}{a bool or a bool vector that indicates if the variance of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of \code{ncol(Y)}) is provided, each
bool element in the vector will be applied to the corresponding GP node (if any) in the final layer. The value of a bool has following effects:
\itemize{
@@ -130,35 +119,35 @@ bool element in the vector will be applied to the corresponding GP node (if any)
\item \code{TRUE}: the variance of the corresponding GP in the final layer will be estimated with the initial value given by the correspondence in \code{scale} (see below).
}
-Defaults to \code{TRUE}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{TRUE}.}
\item{scale}{the initial variance value(s) of GP nodes (if any) in the final layer. If it is a single numeric value, it will be applied to all GP nodes (if any)
in the final layer. If it is a vector (which must have a length of \code{ncol(Y)}), each numeric in the vector will be applied to the corresponding GP node
-(if any) in the final layer. Defaults to \code{1}. This argument is only used when \code{struc = NULL}.}
+(if any) in the final layer. Defaults to \code{1}.}
\item{connect}{a bool indicating whether to implement global input connection to the DGP structure. Setting it to \code{FALSE} may produce a better emulator in some cases at
-the cost of slower training. Defaults to \code{TRUE}. This argument is only used when \code{struc = NULL}.}
+the cost of slower training. Defaults to \code{TRUE}.}
\item{likelihood}{the likelihood type of a DGP emulator:
\enumerate{
\item \code{NULL}: no likelihood layer is included in the emulator.
\item \code{"Hetero"}: a heteroskedastic Gaussian likelihood layer is added for stochastic emulation where the computer model outputs are assumed to follow a heteroskedastic Gaussian distribution
-(i.e., the computer model outputs have varying noises).
-\item \code{"Poisson"}: a Poisson likelihood layer is added for stochastic emulation where the computer model outputs are assumed to a Poisson distribution.
-\item \code{"NegBin"}: a negative Binomial likelihood layer is added for stochastic emulation where the computer model outputs are assumed to follow a negative Binomial distribution.
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{"Categorical"}: a categorical likelihood layer is added for stochastic emulation (i.e., classification), where the computer model outputs are assumed to follow a categorical distribution.
+(i.e., the computer model outputs have input-dependent noise).
+\item \code{"Poisson"}: a Poisson likelihood layer is added for emulation where the computer model outputs are counts and a Poisson distribution is used to model them.
+\item \code{"NegBin"}: a negative Binomial likelihood layer is added for emulation where the computer model outputs are counts and a negative Binomial distribution is used to capture dispersion variability in input space.
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{"Categorical"}: a categorical likelihood layer is added for emulation (classification), where the computer model output is categorical.
}
-When \code{likelihood} is not \code{NULL}, the value of \code{nugget_est} is overridden by \code{FALSE}. Defaults to \code{NULL}. This argument is only used when \code{struc = NULL}.}
+When \code{likelihood} is not \code{NULL}, the value of \code{nugget_est} is overridden by \code{FALSE}. Defaults to \code{NULL}.}
\item{training}{a bool indicating if the initialized DGP emulator will be trained.
When set to \code{FALSE}, \code{\link[=dgp]{dgp()}} returns an untrained DGP emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its specifications
-(especially when a customized \code{struc} is provided) or apply \code{\link[=predict]{predict()}} to check its emulation performance before the training. Defaults to \code{TRUE}.}
+or apply \code{\link[=predict]{predict()}} to check its emulation performance before training. Defaults to \code{TRUE}.}
\item{verb}{a bool indicating if the trace information on DGP emulator construction and training will be printed during the function execution.
Defaults to \code{TRUE}.}
-\item{check_rep}{a bool indicating whether to check the repetitions in the dataset, i.e., if one input
+\item{check_rep}{a bool indicating whether to check for repetitions in the dataset, i.e., if one input
position has multiple outputs. Defaults to \code{TRUE}.}
\item{vecchia}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating whether to use Vecchia approximation for large-scale DGP emulator construction and prediction. Defaults to \code{FALSE}.}
@@ -181,7 +170,7 @@ the number of processes is set to \verb{(max physical cores available - 1)} if \
Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to \code{1}.}
\item{blocked_gibbs}{a bool indicating if the latent variables are imputed layer-wise using ESS-within-Blocked-Gibbs. ESS-within-Blocked-Gibbs would be faster and
-more efficient than ESS-within-Gibbs that imputes latent variables node-wise because it reduces the number of components to be sampled during the Gibbs,
+more efficient than ESS-within-Gibbs that imputes latent variables node-wise because it reduces the number of components to be sampled during Gibbs steps,
especially when there is a large number of GP nodes in layers due to higher input dimensions. Default to \code{TRUE}.}
\item{ess_burn}{number of burnin steps for the ESS-within-Gibbs
@@ -191,8 +180,7 @@ at each I-step of the training. Defaults to \code{10}. This argument is only use
point estimates of model parameters. Must be smaller than the training iterations \code{N}. If this is not specified, only the last 25\% of iterations
are used. Defaults to \code{NULL}. This argument is only used when \code{training = TRUE}.}
-\item{B}{the number of imputations to produce the later predictions. Increase the value to account for
-more imputation uncertainties with slower predictions. Decrease the value for lower imputation uncertainties but faster predictions.
+\item{B}{the number of imputations used to produce predictions. Increase the value to refine the representation of imputation uncertainty.
Defaults to \code{10}.}
\item{internal_input_idx}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} The argument will be removed in the next release. To set up connections of emulators for linked emulations,
@@ -200,9 +188,9 @@ please use the updated \code{\link[=lgp]{lgp()}} function instead.
Column indices of \code{X} that are generated by the linked emulators in the preceding layers.
Set \code{internal_input_idx = NULL} if the DGP emulator is in the first layer of a system or all columns in \code{X} are
-generated by the linked emulators in the preceding layers. Defaults to \code{NULL}. This argument is only used when \code{struc = NULL}.}
+generated by the linked emulators in the preceding layers. Defaults to \code{NULL}.}
-\item{linked_idx}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} The argument will be removed in the next release. To set up connections of emulators for linked emulations,
+\item{linked_idx}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} The argument will be removed in the next release. To set up connections of emulators for linked emulation,
please use the updated \code{\link[=lgp]{lgp()}} function instead.
Either a vector or a list of vectors:
@@ -253,14 +241,12 @@ If a sub-list corresponds to a GP node, it contains four elements:
as \code{FALSE} if \code{internal_input_idx = NULL}. \strong{The slot will be removed in the next release}.
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} \code{linked_idx}: the value passed to argument \code{linked_idx}. It is shown as \code{FALSE} if the argument \code{linked_idx} is \code{NULL}.
\strong{The slot will be removed in the next release}.
-\item \code{seed}: the random seed generated to produce the imputations. This information is stored for the reproducibility when the DGP emulator (that was saved by \code{\link[=write]{write()}}
+\item \code{seed}: the random seed generated to produce imputations. This information is stored for reproducibility when the DGP emulator (that was saved by \code{\link[=write]{write()}}
with the light option \code{light = TRUE}) is loaded back to R by \code{\link[=read]{read()}}.
\item \code{B}: the number of imputations used to generate the emulator.
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{vecchia}: whether the Vecchia approximation is used for the GP emulator training.
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{M}: the size of the conditioning set for the Vecchia approximation in the DGP emulator training.
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{M}: the size of the conditioning set for the Vecchia approximation in the DGP emulator training. \code{M} is generated only when \code{vecchia = TRUE}.
}
-
-\code{internal_dims} and \code{external_dims} are generated only when \code{struc = NULL}. \code{M} is generated only when \code{vecchia = TRUE}.
\item \code{constructor_obj}: a 'python' object that stores the information of the constructed DGP emulator.
\item \code{container_obj}: a 'python' object that stores the information for the linked emulation.
\item \code{emulator_obj}: a 'python' object that stores the information for the predictions from the DGP emulator.
@@ -277,7 +263,7 @@ The returned \code{dgp} object can be used by
\item \code{\link[=summary]{summary()}} to summarize the trained DGP emulator.
\item \code{\link[=write]{write()}} to save the DGP emulator to a \code{.pkl} file.
\item \code{\link[=set_imp]{set_imp()}} to change the number of imputations.
-\item \code{\link[=design]{design()}} for sequential designs.
+\item \code{\link[=design]{design()}} for sequential design.
\item \code{\link[=update]{update()}} to update the DGP emulator with new inputs and outputs.
\item \code{\link[=alm]{alm()}}, \code{\link[=mice]{mice()}}, and \code{\link[=vigf]{vigf()}} to locate next design points.
}
@@ -286,7 +272,7 @@ The returned \code{dgp} object can be used by
This function builds and trains a DGP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/} and learn how to customize a DGP structure.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
Any R vector detected in \code{X} and \code{Y} will be treated as a column vector and automatically converted into a single-column
diff --git a/man/draw.Rd b/man/draw.Rd
index 9ea499e..a01f67e 100644
--- a/man/draw.Rd
+++ b/man/draw.Rd
@@ -5,7 +5,7 @@
\alias{draw.gp}
\alias{draw.dgp}
\alias{draw.bundle}
-\title{Validation plots of a sequential design}
+\title{Validation and diagnostic plots for a sequential design}
\usage{
draw(object, ...)
@@ -25,28 +25,28 @@ draw(object, ...)
\item{...}{N/A.}
-\item{type}{either \code{"rmse"}, for the trace plot of RMSEs, or log-losses for DGP emulators with categorical likelihoods, or customized evaluating metrics
-of emulators constructed during the sequential designs, or \code{"design"}, for visualizations of input designs created by the sequential design procedure.
+\item{type}{specifies the type of plot or visualization to generate:
+\itemize{
+\item \code{"rmse"}: generates a trace plot of RMSEs, log-losses for DGP emulators with categorical likelihoods, or custom evaluation metrics specified via the \code{"eval"} argument in the \verb{[design()]} function.
+\item \code{"design"}: shows visualizations of input designs created by the sequential design procedure.
+}
+
Defaults to \code{"rmse"}.}
-\item{log}{a bool that indicates whether to plot RMSEs, or log-losses (in case of DGP emulators with categorical likelihoods), or customized evaluating
-metrics in log-scale if \code{type = "rmse"}. Defaults to \code{FALSE}.}
+\item{log}{a bool indicating whether to plot RMSEs, log-losses (for DGP emulators with categorical likelihoods), or custom evaluation metrics on a log scale when \code{type = "rmse"}.
+Defaults to \code{FALSE}.}
-\item{emulator}{a vector of indices of emulators packed in \code{object} to be drawn, if \code{object} is an instance of the \code{bundle} class. When set to \code{NULL}, all
+\item{emulator}{an index or vector of indices of emulators packed in \code{object}. This argument is only used if \code{object} is an instance of the \code{bundle} class. When set to \code{NULL}, all
emulators in the bundle are drawn. Defaults to \code{NULL}.}
}
\value{
A \code{patchwork} object.
}
\description{
-This function draws validation plots of the sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
+This function draws diagnostic and validation plots for a sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\note{
-If a customized evaluating function is provided to \code{\link[=design]{design()}} and the function returns a single evaluating metric value when \code{object} is
-an instance of the \code{bundle} class, the value of \code{emulator} has no effects on the plot when \code{type = "rmse"}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/get_thread_num.Rd b/man/get_thread_num.Rd
index 6e786fe..b10a81d 100644
--- a/man/get_thread_num.Rd
+++ b/man/get_thread_num.Rd
@@ -16,5 +16,5 @@ This function gets the number of threads used for parallel computations involved
in the package.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
diff --git a/man/gp.Rd b/man/gp.Rd
index ce0a891..7ccc037 100644
--- a/man/gp.Rd
+++ b/man/gp.Rd
@@ -7,7 +7,6 @@
gp(
X,
Y,
- struc = NULL,
name = "sexp",
lengthscale = rep(0.1, ncol(X)),
bounds = NULL,
@@ -31,29 +30,23 @@ gp(
\item{Y}{a matrix with only one column and each row being an output data point.}
-\item{struc}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} an object produced by \code{\link[=kernel]{kernel()}} that gives a user-defined GP specifications. When \code{struc = NULL},
-the GP specifications are automatically generated using information provided in \code{name}, \code{lengthscale},
-\code{nugget_est}, \code{nugget}, \code{scale_est}, \code{scale},and \code{internal_input_idx}. Defaults to \code{NULL}.
-
-\strong{The argument will be removed in the next release. To customize GP specifications, please adjust the other arguments in the \code{\link[=gp]{gp()}} function.}}
-
\item{name}{kernel function to be used. Either \code{"sexp"} for squared exponential kernel or
-\code{"matern2.5"} for Matérn-2.5 kernel. Defaults to \code{"sexp"}. This argument is only used when \code{struc = NULL}.}
+\code{"matern2.5"} for Matérn-2.5 kernel. Defaults to \code{"sexp"}.}
-\item{lengthscale}{initial values of lengthscales in the kernel function. It can be a single numeric value or a vector:
+\item{lengthscale}{initial values of lengthscales in the kernel function. It can be a single numeric value or a vector of length \code{ncol(X)}:
\itemize{
\item if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;
-\item if it is a vector (which must have a length of \code{ncol(X)}), it is assumed that kernel functions across input dimensions have different lengthscales.
+\item if it is a vector, it is assumed that kernel functions across input dimensions have different lengthscales.
}
-Defaults to a vector of \code{0.1}. This argument is only used when \code{struc = NULL}.}
+Defaults to a vector of \code{0.1}.}
\item{bounds}{the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is
the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults
-to \code{NULL} where no bounds are specified for the lengthscales. This argument is only used when \code{struc = NULL}.}
+to \code{NULL} where no bounds are specified for the lengthscales.}
\item{prior}{prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior (\code{"ga"}), inverse gamma prior (\code{"inv_ga"}),
-or jointly robust prior (\code{"ref"}). Defaults to \code{"ref"}. This argument is only used when \code{struc = NULL}. See the reference below for the jointly
+or jointly robust prior (\code{"ref"}). Defaults to \code{"ref"}. See the reference below for the jointly
robust prior.}
\item{nugget_est}{a bool indicating if the nugget term is to be estimated:
@@ -62,13 +55,13 @@ robust prior.}
\item \code{TRUE}: the nugget term will be estimated.
}
-Defaults to \code{FALSE}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{FALSE}.}
\item{nugget}{the initial nugget value. If \code{nugget_est = FALSE}, the assigned value is fixed during the training.
-Set \code{nugget} to a small value (e.g., \code{1e-8}) and the corresponding bool in \code{nugget_est} to \code{FASLE} for deterministic emulations where the emulator
-interpolates the training data points. Set \code{nugget} to a reasonable larger value and the corresponding bool in \code{nugget_est} to \code{TRUE} for stochastic
-emulations where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to \code{1e-8} if \code{nugget_est = FALSE} and
-\code{0.01} if \code{nugget_est = TRUE}. This argument is only used when \code{struc = NULL}.}
+Set \code{nugget} to a small value (e.g., \code{1e-8}) and the corresponding bool in \code{nugget_est} to \code{FALSE} for deterministic computer models where the emulator
+should interpolate the training data points. Set \code{nugget} to a larger value and the corresponding bool in \code{nugget_est} to \code{TRUE} for stochastic
+emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to \code{1e-8} if \code{nugget_est = FALSE} and
+\code{0.01} if \code{nugget_est = TRUE}.}
\item{scale_est}{a bool indicating if the variance is to be estimated:
\enumerate{
@@ -76,16 +69,15 @@ emulations where the computer model outputs are assumed to follow a homogeneous
\item \code{TRUE}: the variance term will be estimated.
}
-Defaults to \code{TRUE}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{TRUE}.}
\item{scale}{the initial variance value. If \code{scale_est = FALSE}, the assigned value is fixed during the training.
-Defaults to \code{1}. This argument is only used when \code{struc = NULL}.}
+Defaults to \code{1}.}
\item{training}{a bool indicating if the initialized GP emulator will be trained.
-When set to \code{FALSE}, \code{\link[=gp]{gp()}} returns an untrained GP emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its specifications
-(especially when a customized \code{struc} is provided) or apply \code{\link[=predict]{predict()}} to check its emulation performance before the training. Defaults to \code{TRUE}.}
+When set to \code{FALSE}, \code{\link[=gp]{gp()}} returns an untrained GP emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its specification or apply \code{\link[=predict]{predict()}} to check its emulation performance before the training. Defaults to \code{TRUE}.}
-\item{verb}{a bool indicating if the trace information on GP emulator construction and training will be printed during the function execution.
+\item{verb}{a bool indicating if the trace information on GP emulator construction and training will be printed during function execution.
Defaults to \code{TRUE}.}
\item{vecchia}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating whether to use Vecchia approximation for large-scale GP emulator construction and prediction. Defaults to \code{FALSE}.
@@ -103,7 +95,7 @@ If \code{ord = NULL}, the default random ordering is used. Defaults to \code{NUL
\item{internal_input_idx}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} The column indices of \code{X} that are generated by the linked emulators in the preceding layers.
Set \code{internal_input_idx = NULL} if the GP emulator is in the first layer of a system or all columns in \code{X} are
-generated by the linked emulators in the preceding layers. Defaults to \code{NULL}. This argument is only used when \code{struc = NULL}.
+generated by the linked emulators in the preceding layers. Defaults to \code{NULL}.
\strong{The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated \code{\link[=lgp]{lgp()}} function instead.}}
@@ -152,8 +144,6 @@ It is shown as \code{FALSE} if \code{internal_input_idx = NULL}. \strong{The slo
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{vecchia}: whether the Vecchia approximation is used for the GP emulator training.
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} \code{M}: the size of the conditioning set for the Vecchia approximation in the GP emulator training.
}
-
-\code{internal_dims} and \code{external_dims} are generated only when \code{struc = NULL}.
\item \code{constructor_obj}: a 'python' object that stores the information of the constructed GP emulator.
\item \code{container_obj}: a 'python' object that stores the information for the linked emulation.
\item \code{emulator_obj}: a 'python' object that stores the information for the predictions from the GP emulator.
@@ -176,7 +166,7 @@ The returned \code{gp} object can be used by
This function builds and trains a GP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
Any R vector detected in \code{X} and \code{Y} will be treated as a column vector and automatically converted into a single-column
diff --git a/man/init_py.Rd b/man/init_py.Rd
index 6289af3..2366313 100644
--- a/man/init_py.Rd
+++ b/man/init_py.Rd
@@ -31,7 +31,7 @@ of 'dgpsi'. Defaults to \code{FALSE}.}
in \code{dgpsi_ver} if it has already been installed. This argument is useful when the 'python' environment
is corrupted and one wants to completely uninstall and reinstall it. Defaults to \code{FALSE}.}
-\item{verb}{a bool indicating if the trace information will be printed during the function execution.
+\item{verb}{a bool indicating if trace information will be printed during function execution.
Defaults to \code{TRUE}.}
}
\value{
@@ -41,7 +41,7 @@ No return value, called to install required 'python' environment.
This function initializes the 'python' environment for the package.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/kernel.Rd b/man/kernel.Rd
deleted file mode 100644
index d3fe9dc..0000000
--- a/man/kernel.Rd
+++ /dev/null
@@ -1,107 +0,0 @@
-% Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/kernel.R
-\name{kernel}
-\alias{kernel}
-\title{Initialize a Gaussian process node}
-\usage{
-kernel(
- length,
- scale = 1,
- nugget = 1e-06,
- name = "sexp",
- prior_name = "ga",
- prior_coef = NULL,
- bounds = NULL,
- nugget_est = FALSE,
- scale_est = FALSE,
- input_dim = NULL,
- connect = NULL
-)
-}
-\arguments{
-\item{length}{a vector of lengthscales. The length of the vector equals to:
-\enumerate{
-\item either one if the lengthscales in the kernel function are assumed same across input dimensions; or
-\item the total number of input dimensions, which is the sum of the number of feeding GP nodes
-in the last layer (defined by the argument \code{input_dim}) and the number of connected global
-input dimensions (defined by the argument \code{connect}), if the lengthscales in the kernel function
-are assumed different across input dimensions.
-}}
-
-\item{scale}{the variance of a GP node. Defaults to \code{1}.}
-
-\item{nugget}{the nugget term of a GP node. Defaults to \code{1e-6}.}
-
-\item{name}{kernel function to be used. Either \code{"sexp"} for squared exponential kernel or
-\code{"matern2.5"} for Matérn-2.5 kernel. Defaults to \code{"sexp"}.}
-
-\item{prior_name}{prior options for the lengthscales and nugget term: gamma prior (\code{"ga"}), inverse gamma prior (\code{"inv_ga"}),
-or jointly robust prior (\code{"ref"}) for the lengthscales and nugget term. Set \code{NULL} to disable the prior. Defaults to \code{"ga"}.}
-
-\item{prior_coef}{a vector that contains the coefficients for different priors:
-\itemize{
-\item for the gamma prior, it is a vector of two values specifying the shape and rate parameters of the gamma distribution. Set to \code{NULL} for the
-default value \code{c(1.6,0.3)}.
-\item for the inverse gamma prior, it is a vector of two values specifying the shape and scale parameters of the inverse gamma distribution. Set
-to \code{NULL} for the default value \code{c(1.6,0.3)}.
-\item for the jointly robust prior, it is a vector of a single value specifying the \code{a} parameter in the prior. Set to \code{NULL} for the
-default value \code{c(0.2)}. See the reference below for the jointly robust prior.
-}
-
-Defaults to \code{NULL}.}
-
-\item{bounds}{a vector of length two that gives the lower bound (the first element of the vector) and the upper bound (the second element of the
-vector) of all lengthscales of the GP node. Defaults to \code{NULL} where no bounds are specified for the lengthscales.}
-
-\item{nugget_est}{set to \code{TRUE} to estimate the nugget term or to \code{FALSE} to fix the nugget term as specified
-by the argument \code{nugget}. If set to \code{TRUE}, the value set to the argument \code{nugget} is used as the initial
-value. Defaults to \code{FALSE}.}
-
-\item{scale_est}{set to \code{TRUE} to estimate the variance (i.e., scale) or to \code{FALSE} to fix the variance (i.e., scale) as specified
-by the argument \code{scale}. Defaults to \code{FALSE}.}
-
-\item{input_dim}{a vector that contains either
-\enumerate{
-\item the indices of GP nodes in the feeding layer whose outputs feed into this GP node; or
-\item the indices of global input dimensions that are linked to the outputs of some feeding emulators,
-if this GP node is in the first layer of a GP or DGP, which will be used for the linked emulation.
-}
-
-When set to \code{NULL},
-\enumerate{
-\item all outputs from the GP nodes in the feeding layer feed into this GP node; or
-\item all global input dimensions feed into this GP node.
-}
-
-Defaults to \code{NULL}.}
-
-\item{connect}{a vector that contains the indices of dimensions in the global
-input connecting to this GP node as additional input dimensions. When set to \code{NULL}, no global input
-connection is implemented. Defaults to \code{NULL}. When this GP node is in the first layer of a GP or DGP emulator,
-which will consequently be used for linked emulation, \code{connect} gives the indices of global input dimensions
-that are not connected to some feeding emulators. In such a case, set \code{input_dim} to a vector of indices of
-the remaining input dimensions that are connected to the feeding emulators.}
-}
-\value{
-A 'python' object to represent a GP node.
-}
-\description{
-\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}
-
-This function is deprecated and will be removed in the next release. To customize
-DGP specifications, adjust the other arguments in the \code{dgp()} function instead.
-}
-\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\examples{
-\dontrun{
-
-# Check https://mingdeyu.github.io/dgpsi-R/ for examples
-# on how to customize DGP structures using kernel().
-}
-}
-\references{
-Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. \emph{Bayesian Analysis}, \strong{14(3)}, 857-885.
-}
-\keyword{internal}
diff --git a/man/lgp.Rd b/man/lgp.Rd
index 7b1eb37..fc4d2e4 100644
--- a/man/lgp.Rd
+++ b/man/lgp.Rd
@@ -16,14 +16,16 @@ in the same order of the specified computer model system's hierarchy. \strong{Th
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a data frame that defines the connection structure between emulators in the linked system, with the following columns:
\itemize{
\item \code{From_Emulator}: the ID of the emulator providing the output. This ID must match the \code{id} slot
-in the corresponding emulator object (produced by \code{\link[=gp]{gp()}} or \code{\link[=dgp]{dgp()}}) within \code{emulators} (the next argument following \code{struc}). The \code{id} slot
+in the corresponding emulator object (produced by \code{\link[=gp]{gp()}} or \code{\link[=dgp]{dgp()}}) within \code{emulators} argument of \code{\link[=lgp]{lgp()}}, or it should
+be special value \code{"Global"}, indicating the global inputs to the model chain or network. The \code{id} slot
is either automatically generated by \code{\link[=gp]{gp()}} or \code{\link[=dgp]{dgp()}}, or can be manually specified via the \code{id} argument in these functions or set with the
\code{\link[=set_id]{set_id()}} function.
\item \code{To_Emulator}: the ID of the emulator receiving the input, also matching the \code{id} slot in the
-corresponding emulator object. The \code{id} slot is generated or set as described above for \code{From_Emulator}.
-\item \code{From_Output}: a single integer specifying the output dimension from the \code{From_Emulator} that is being connected to the
-input dimension of the \code{To_Emulator}.
-\item \code{To_Input}: a single integer specifying the input dimension of the \code{To_Emulator} that is receiving the output dimension
+corresponding emulator object.
+\item \code{From_Output}: a single integer specifying the output dimension of the \code{From_Emulator} that is being connected to the
+input dimension of the \code{To_Emulator} specified by \code{To_Input}. If \code{From_Emulator} is \code{"Global"}, then \code{From_Output}
+indicates the dimension of the global input passed to the \code{To_Emulator}.
+\item \code{To_Input}: a single integer specifying the input dimension of the \code{To_Emulator} that is receiving the \code{From_Output} dimension
from the \code{From_Emulator}.
}
@@ -31,10 +33,7 @@ Each row represents a single one-to-one connection between a specified output di
and a corresponding input dimension of \code{To_Emulator}. If multiple connections are required between
two emulators, each connection should be specified in a separate row.
-Additionally, the special value \code{"Global"} in \code{From_Emulator} can be used to represent global input data, linking a
-specified dimension of the global input directly to an input dimension of an emulator.
-
-\strong{Note:} When using this data frame option, \code{emulators} argument must be provided.
+\strong{Note:} When using the data frame option for \code{struc}, the \code{emulators} argument must be provided.
}}
\item{emulators}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a list of emulator objects, each containing an \code{id} slot that uniquely identifies it within the
@@ -44,19 +43,22 @@ If the same emulator is used multiple times within the linked system, the list m
of that emulator, each with a unique ID stored in their \code{id} slot. Use the \code{\link[=set_id]{set_id()}} function to produce copies with different IDs
to ensure each instance can be uniquely referenced.}
-\item{B}{the number of imputations to produce the predictions. Increase the value to account for more
-imputation uncertainties. Decrease the value for lower imputation uncertainties but faster predictions.
-If the system consists only GP emulators, \code{B} is set to \code{1} automatically. Defaults to \code{10}.}
+\item{B}{the number of imputations used for prediction. Increase the value to refine representation of
+imputation uncertainty. If the system consists of only GP emulators, \code{B} is set to \code{1} automatically. Defaults to \code{10}.}
-\item{activate}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating if the initialized linked emulator will be activated for prediction, which can be used with \code{\link[=predict]{predict()}} or \code{\link[=validate]{validate()}}.
-When set to \code{FALSE}, \code{\link[=lgp]{lgp()}} returns an inactive linked emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its structure. Defaults to \code{TRUE}. This argument is only
-applicable when \code{struc} is specified as a data frame.}
+\item{activate}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating whether the initialized linked emulator should be activated:
+\itemize{
+\item If \code{activate = FALSE}, \code{\link[=lgp]{lgp()}} returns an inactive linked emulator, allowing inspection of its structure using \code{\link[=summary]{summary()}}.
+\item If \code{activate = TRUE}, \code{\link[=lgp]{lgp()}} returns an active linked emulator, ready for prediction and validation using \code{\link[=predict]{predict()}} and \code{\link[=validate]{validate()}}, respectively.
+}
+
+Defaults to \code{TRUE}. This argument is only applicable when \code{struc} is specified as a data frame.}
-\item{verb}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating if the trace information on linked (D)GP emulator construction will be printed during the function call.
+\item{verb}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} a bool indicating if the trace information on linked (D)GP emulator construction should be printed during the function call.
Defaults to \code{TRUE}. This argument is only applicable when \code{struc} is specified as a data frame.}
\item{id}{an ID to be assigned to the linked (D)GP emulator. If an ID is not provided (i.e., \code{id = NULL}), a UUID
-(Universally Unique Identifier) will be automatically generated and assigned to the emulator. Default to \code{NULL}.}
+(Universally Unique Identifier) will be automatically generated and assigned to the emulator. Defaults to \code{NULL}.}
}
\value{
An S3 class named \code{lgp} that contains three slots:
@@ -66,7 +68,7 @@ An S3 class named \code{lgp} that contains three slots:
\item \code{emulator_obj}, a 'python' object that stores the information for predictions from the linked emulator.
\item \code{specs}: a list that contains
\enumerate{
-\item \code{seed}: the random seed generated to produce the imputations. This information is stored for the reproducibility
+\item \code{seed}: the random seed generated to produce the imputations. This information is stored for reproducibility
when the linked (D)GP emulator (that was saved by \code{\link[=write]{write()}} with the light option \code{light = TRUE}) is loaded back
to R by \code{\link[=read]{read()}}.
\item \code{B}: the number of imputations used to generate the linked (D)GP emulator.
@@ -91,17 +93,17 @@ indicates a position higher up in that layer.
The returned \code{lgp} object can be used by
\itemize{
\item \code{\link[=predict]{predict()}} for linked (D)GP predictions.
-\item \code{\link[=validate]{validate()}} for the OOS validation.
-\item \code{\link[=plot]{plot()}} for the validation plots.
+\item \code{\link[=validate]{validate()}} for OOS validation.
+\item \code{\link[=plot]{plot()}} for validation plots.
\item \code{\link[=summary]{summary()}} to summarize the constructed linked (D)GP emulator.
\item \code{\link[=write]{write()}} to save the linked (D)GP emulator to a \code{.pkl} file.
}
}
\description{
-This function constructs a linked (D)GP emulator.
+This function constructs a linked (D)GP emulator for a model chain or network.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/mice.Rd b/man/mice.Rd
index 9a0eb35..a60669e 100644
--- a/man/mice.Rd
+++ b/man/mice.Rd
@@ -61,13 +61,13 @@ mice(object, ...)
\item{...}{any arguments (with names different from those of arguments used in \code{\link[=mice]{mice()}}) that are used by \code{aggregate}
can be passed here.}
-\item{x_cand}{a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
-from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} could also
-be a list with length equal to the number of emulators contained in \code{object}. In this case, each slot in \code{x_cand} should be a candidate set matrix
-for each emulator included in the bundle. Defaults to \code{NULL}.}
+\item{x_cand}{a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
+from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} can also be a list.
+The list must have a length equal to the number of emulators in \code{object}, with each element being a matrix representing the candidate set for a corresponding
+emulator in the bundle. Defaults to \code{NULL}.}
-\item{n_cand}{an integer that gives the size of the candidate set to be generated from which the next design point is determined. This argument
-is only used when \code{x_cand} is \code{NULL}. Defaults to \code{200}.}
+\item{n_cand}{an integer specifying the size of the candidate set to be generated for selecting the next design point(s).
+This argument is used only when \code{x_cand} is \code{NULL}. Defaults to \code{200}.}
\item{batch_size}{an integer that gives the number of design points to be chosen.
Defaults to \code{1}.}
@@ -99,32 +99,38 @@ of the matrix equals to:
\item the emulator output dimension if \code{object} is an instance of the \code{dgp} class; or
\item the number of emulators contained in \code{object} if \code{object} is an instance of the \code{bundle} class.
}
-\item the output should be a vector that gives aggregations of scores at different design points.
+\item the output should be a vector that gives aggregate scores at different design points.
}
-Set to \code{NULL} to disable the aggregation. Defaults to \code{NULL}.}
+Set to \code{NULL} to disable aggregation. Defaults to \code{NULL}.}
}
\value{
\enumerate{
-\item If \code{x_cand} is not \code{NULL} and:
+\item If \code{x_cand} is not \code{NULL}:
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a vector is returned with length equal to \code{batch_size}, giving the positions (i.e., row numbers)
-of next design points from \code{x_cand}.
-\item \code{object} is an instance of the \code{dgp} class, a vector is returned with length equal to \code{batch_size * D}, giving positions (i.e., row numbers)
-of next design points from \code{x_cand} to be added to the DGP emulator. \code{D} equals to the number of output dimensions of the DGP
-emulator if there is no likelihood layer in the hierarchy. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer,
-\code{D = 2}. If \code{object} is a DGP emulator with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a matrix is returned with row number equal to \code{batch_size} and column number equal to the number of
-emulators in the bundle, giving positions (i.e., row numbers) of next design points from \code{x_cand} to be added to individual emulators.
-}
-\item If \code{x_cand = NULL} and:
+\item When \code{object} is an instance of the \code{gp} class, a vector of length \code{batch_size} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand}.
+\item When \code{object} is an instance of the \code{dgp} class, a vector of length \code{batch_size * D} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand} to be added to the DGP emulator.
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a matrix is returned with row number equal to \code{batch_size}, giving the next design points to be evaluated.
-\item \code{object} is an instance of the \code{dgp} class, a matrix is returned with row number equal to \code{batch_size * D} where \code{D} is the number of output dimensions of the DGP
-emulator if no likelihood layer is included. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}. If \code{object} is a DGP emulator
-with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a list is returned with the length equal to the number of
-emulators in the bundle. Each element in the list is a matrix with row number equal to \code{batch_size}, giving next design points to be added to individual emulators.
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a matrix is returned with \code{batch_size} rows and a column for each emulator in
+the bundle, containing the positions (row numbers) of the next design points from \code{x_cand} for individual emulators.
+}
+\item If \code{x_cand} is \code{NULL}:
+\itemize{
+\item When \code{object} is an instance of the \code{gp} class, a matrix with \code{batch_size} rows is returned, giving the next design points to be evaluated.
+\item When \code{object} is an instance of the \code{dgp} class, a matrix with \code{batch_size * D} rows is returned, where:
+\itemize{
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a list is returned with a length equal to the number of emulators in the bundle. Each
+element of the list is a matrix with \code{batch_size} rows, where each row represents a design point to be added to the corresponding emulator.
}
}
}
@@ -133,11 +139,12 @@ This function searches from a candidate set to locate the next design point(s) t
or a bundle of (D)GP emulators using the Mutual Information for Computer Experiments (MICE), see the reference below.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
-The column order of the first argument of \code{aggregate} must be consistent with the order of emulator output dimensions (if \code{object} is an instance of the
-\code{dgp} class), or the order of emulators placed in \code{object} if \code{object} is an instance of the \code{bundle} class.
+The first column of the matrix supplied to the first argument of \code{aggregate} must correspond to the first output dimension of the DGP emulator
+if \code{object} is an instance of the \code{dgp} class, and so on for subsequent columns and dimensions. If \code{object} is an instance of the \code{bundle} class,
+the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
}
\examples{
\dontrun{
diff --git a/man/nllik.Rd b/man/nllik.Rd
index 162c9d5..ea450aa 100644
--- a/man/nllik.Rd
+++ b/man/nllik.Rd
@@ -2,7 +2,7 @@
% Please edit documentation in R/utils.R
\name{nllik}
\alias{nllik}
-\title{Calculate negative predicted log-likelihood}
+\title{Calculate the predictive negative log-likelihood}
\usage{
nllik(object, x, y)
}
@@ -20,17 +20,9 @@ across all testing data points. The second one, named \code{allNLL}, is a vector
log-likelihood for each testing data point.
}
\description{
-This function computes the negative predicted log-likelihood from a
+This function computes the predictive negative log-likelihood from a
DGP emulator with a likelihood layer.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
-}
-\examples{
-\dontrun{
-
-# Check https://mingdeyu.github.io/dgpsi-R/ for examples
-# on how to compute the negative predicted log-likelihood
-# using nllik().
-}
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
diff --git a/man/pack.Rd b/man/pack.Rd
index f24f253..928fa09 100644
--- a/man/pack.Rd
+++ b/man/pack.Rd
@@ -28,7 +28,7 @@ This function packs GP emulators and DGP emulators into a \code{bundle} class fo
sequential designs if each emulator emulates one output dimension of the underlying simulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/plot.Rd b/man/plot.Rd
index 4d5fba7..8c67eca 100644
--- a/man/plot.Rd
+++ b/man/plot.Rd
@@ -74,11 +74,11 @@
\item{y_test}{same as that of \code{\link[=validate]{validate()}}.}
-\item{dim}{if \code{dim = NULL}, the index of an emulator's input will be shown on the x-axis in validation plots. Otherwise, \code{dim} indicates
+\item{dim}{if \code{dim = NULL}, the index of an emulator's input within the design will be shown on the x-axis in validation plots. Otherwise, \code{dim} indicates
which dimension of an emulator's input will be shown on the x-axis in validation plots:
\itemize{
\item If \code{x} is an instance of the \code{gp} of \code{dgp} class, \code{dim} is an integer.
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} If \code{x} is an instance of the \code{lgp} class created by \code{\link[=lgp]{lgp()}} without specifying argument \code{struc} in data frame form, \code{dim} can be
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} If \code{x} is an instance of the \code{lgp} class created by \code{\link[=lgp]{lgp()}} without specifying the \code{struc} argument in data frame form, \code{dim} can be:
\enumerate{
\item an integer referring to the dimension of the global input to emulators in the first layer of a linked emulator system; or
\item a vector of three integers referring to the dimension (specified by the third integer) of the global input to an emulator
@@ -97,7 +97,7 @@ This argument is only used when \code{style = 1}. Defaults to \code{NULL}.}
\item{sample_size}{same as that of \code{\link[=validate]{validate()}}.}
-\item{style}{either \code{1} or \code{2}, indicating two different types of validation plots.}
+\item{style}{either \code{1} or \code{2}, indicating two different plotting styles for validation.}
\item{min_max}{a bool indicating if min-max normalization will be used to scale the testing output, RMSE, predictive mean and std from the
emulator. Defaults to \code{TRUE}. This argument is not applicable to DGP emulators with categorical likelihoods.}
@@ -123,7 +123,7 @@ Defaults to \code{'turbo'} (or \code{'H'}).}
individual points when the input of the emulator is one-dimensional and \code{style = 1}. This argument is not applicable to DGP emulators with
categorical likelihoods. Defaults to \code{'points'}}
-\item{verb}{a bool indicating if the trace information on plotting will be printed during the function execution.
+\item{verb}{a bool indicating if trace information on plotting will be printed during execution.
Defaults to \code{TRUE}.}
\item{M}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} same as that of \code{\link[=validate]{validate()}}.}
@@ -141,20 +141,17 @@ A \code{patchwork} object.
This function draws validation plots of a GP, DGP, or linked (D)GP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
\item \code{\link[=plot]{plot()}} calls \code{\link[=validate]{validate()}} internally to obtain validation results for plotting. However, \code{\link[=plot]{plot()}} will not export the
emulator object with validation results. Instead, it only returns the plotting object. For small-scale validations (i.e., small
-training or testing data points), direct execution of \code{\link[=plot]{plot()}} is fine. However, for moderate- to large-scale validations,
+training or testing data points), direct execution of \code{\link[=plot]{plot()}} works well. However, for moderate- to large-scale validation,
it is recommended to first run \code{\link[=validate]{validate()}} to obtain and store validation results in the emulator object, and then supply the
-object to \code{\link[=plot]{plot()}}. This is because if an emulator object has the validation results stored, each time when \code{\link[=plot]{plot()}}
-is invoked, unnecessary evaluations of repetitive LOO or OOS validation will not be implemented.
-\item \code{\link[=plot]{plot()}} uses information provided in \code{x_test} and \code{y_test} to produce the OOS validation plots. Therefore, if validation results
-are already stored in \code{x}, unless \code{x_test} and \code{y_test} are identical to those used by \code{\link[=validate]{validate()}}, \code{\link[=plot]{plot()}} will re-evaluate OOS
-validations before plotting.
-\item The returned \code{patchwork} object contains the \code{ggplot2} objects. One can modify the included individual ggplots
+object to \code{\link[=plot]{plot()}}. \code{\link[=plot]{plot()}} checks the object's \code{loo} and \code{oos} slots prior to calling \code{\link[=validate]{validate()}} and will not perform further calculation if the required information is already stored.
+\item \code{\link[=plot]{plot()}} will only use stored OOS validation if \code{x_test} and \code{y_test} are identical to those used by \code{\link[=validate]{validate()}} to produce the data contained in the object's \code{oos} slot, otherwise \code{\link[=plot]{plot()}} will re-evaluate OOS validation before plotting.
+\item The returned \link{patchwork} object contains the \link{ggplot2} objects. One can modify the included individual ggplots
by accessing them with double-bracket indexing. See \url{https://patchwork.data-imaginist.com/} for further information.
}
}
diff --git a/man/predict.Rd b/man/predict.Rd
index 36ced11..debcc12 100644
--- a/man/predict.Rd
+++ b/man/predict.Rd
@@ -5,7 +5,7 @@
\alias{predict.dgp}
\alias{predict.lgp}
\alias{predict.gp}
-\title{Predictions from GP, DGP, or linked (D)GP emulators}
+\title{Prediction from GP, DGP, or linked (D)GP emulators}
\usage{
\method{predict}{dgp}(
object,
@@ -51,9 +51,7 @@
\item if \code{object} is an instance of the \code{gp} or \code{dgp} class, \code{x} is a matrix where each row is an input testing data point and each column is an input dimension.
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} if \code{object} is an instance of the \code{lgp} class created by \code{\link[=lgp]{lgp()}} without specifying argument \code{struc} in data frame form, \code{x} can be either a matrix or a list:
\itemize{
-\item if \code{x} is a matrix, it is the global testing input data that feed into the emulators in the first layer of a system.
-The rows of \code{x} represent different input data points and the columns represent input dimensions across all emulators in
-the first layer of the system. In this case, it is assumed that the only global input to the system is the input to the
+\item if \code{x} is a matrix, its rows are treated as instances of the \code{Global} inputs. In this case, it is assumed that the only global input to the system is the input to the
emulators in the first layer and there is no global input to emulators in other layers.
\item if \code{x} is a list, it should have \emph{L} (the number of layers in an emulator system) elements. The first element
is a matrix that represents the global testing input data that feed into the emulators in the first layer of the system. The
@@ -72,20 +70,23 @@ corresponding to rows where the \code{From_Emulator} column is \code{"Global"}.
}}
\item{method}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#updated}{\figure{lifecycle-updated.svg}{options: alt='[Updated]'}}}{\strong{[Updated]}} the prediction approach to use: either the mean-variance approach (\code{"mean_var"}) or the sampling approach (\code{"sampling"}).
-For DGP emulators with a categorical likelihood (\code{likelihood = "Categorical"} in \code{\link[=dgp]{dgp()}}), the argument is only used when \code{full_layer = TRUE}.
-By default, the method is set to \code{"sampling"} for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods and \code{"mean_var"} otherwise.}
+The mean-variance approach returns the means and variances of the predictive distributions, while the sampling approach generates samples from predictive distributions
+using the derived means and variances. For DGP emulators with a categorical likelihood (\code{likelihood = "Categorical"} in \code{\link[=dgp]{dgp()}}), \code{method} is only applicable
+when \code{full_layer = TRUE}. In this case, the sampling approach generates samples from the GP nodes in all hidden layers using the derived means and variances,
+and subsequently propagates these samples through the categorical likelihood. By default, the method is set to \code{"sampling"} for DGP emulators with Poisson, Negative Binomial, and
+Categorical likelihoods, and to \code{"mean_var"} otherwise.}
\item{mode}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} whether to predict the classes (\code{"label"}) or probabilities (\code{"proba"}) of different classes when \code{object} is a DGP emulator with a categorical likelihood.
Defaults to \code{"label"}.}
-\item{full_layer}{a bool indicating whether to output the predictions of all layers. Defaults to \code{FALSE}. Only used when \code{object} is a DGP and linked (D)GP emulator.}
+\item{full_layer}{a bool indicating whether to output the predictions of all layers. Defaults to \code{FALSE}. Only used when \code{object} is a DGP or a linked (D)GP emulator.}
\item{sample_size}{the number of samples to draw for each given imputation if \code{method = "sampling"}. Defaults to \code{50}.}
\item{M}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} the size of the conditioning set for the Vecchia approximation in the emulator prediction. Defaults to \code{50}. This argument is only used if the emulator \code{object}
was constructed under the Vecchia approximation.}
-\item{cores}{the number of processes to be used for predictions. If set to \code{NULL}, the number of processes is set to \verb{max physical cores available \%/\% 2}. Defaults to \code{1}.}
+\item{cores}{the number of processes to be used for prediction. If set to \code{NULL}, the number of processes is set to \verb{max physical cores available \%/\% 2}. Defaults to \code{1}.}
\item{chunks}{the number of chunks that the testing input matrix \code{x} will be divided into for multi-cores to work on.
Only used when \code{cores} is not \code{1}. If not specified (i.e., \code{chunks = NULL}), the number of chunks is set to the value of \code{cores}.
@@ -124,7 +125,7 @@ of size: \code{B * sample_size}, where \code{B} is the number of imputations spe
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} If \code{object} is an instance of the \code{dgp} class with a categorical likelihood:
\enumerate{
\item if \code{full_layer = FALSE} and \code{mode = "label"}: an updated \code{object} is returned with an additional slot called \code{results} that contains one matrix named \code{label}.
-The matrix has its rows corresponding to testing positions and columns corresponding to label samples of size: \code{B * sample_size}, where \code{B} is the number
+The matrix has rows corresponding to testing positions and columns corresponding to sample labels of size: \code{B * sample_size}, where \code{B} is the number
of imputations specified in \code{\link[=dgp]{dgp()}}.
\item if \code{full_layer = FALSE} and \code{mode = "proba"}, an updated \code{object} is returned with an additional slot called \code{results}. This slot contains \emph{D} matrices (where
\emph{D} is the number of classes in the training output), where each matrix gives probability samples for the corresponding class with its rows corresponding to testing
@@ -159,23 +160,23 @@ is \code{B * sample_size}.
\enumerate{
\item if \code{method = "mean_var"} and \code{full_layer = FALSE}: an updated \code{object} is returned with an additional slot called \code{results} that
contains two sub-lists named \code{mean} for the predictive means and \code{var} for the predictive variances respectively. Each sub-list
-contains \emph{K} number (same number of emulators in the final layer of the system) of matrices named by the \code{ID}s of the corresponding emulators in the final layer.
-Each matrix has its rows corresponding to global testing positions and columns corresponding to output dimensions of the associated emulator
+contains \emph{K} (same number of emulators in the final layer of the system) matrices named using the \code{ID}s of the corresponding emulators in the final layer.
+Each matrix has rows corresponding to global testing positions and columns corresponding to output dimensions of the associated emulator
in the final layer.
\item if \code{method = "mean_var"} and \code{full_layer = TRUE}: an updated \code{object} is returned with an additional slot called \code{results} that contains
two sub-lists named \code{mean} for the predictive means and \code{var} for the predictive variances respectively. Each sub-list contains \emph{L}
(i.e., the number of layers in the emulated system) components named \verb{layer1, layer2,..., layerL}. Each component represents a layer
-and contains \emph{K} number (same number of emulators in the corresponding layer of the system) of matrices named by the \code{ID}s of the corresponding emulators in that layer.
+and contains \emph{K} (same number of emulators in the corresponding layer of the system) matrices named using the \code{ID}s of the corresponding emulators in that layer.
Each matrix has its rows corresponding to global testing positions and columns corresponding to output dimensions of the associated
GP/DGP emulator in the corresponding layer.
\item if \code{method = "sampling"} and \code{full_layer = FALSE}: an updated \code{object} is returned with an additional slot called \code{results} that contains
-\emph{K} number (same number of emulators in the final layer of the system) of sub-lists named by the \code{ID}s of the corresponding emulators in the final layer. Each
+\emph{K} (same number of emulators in the final layer of the system) sub-lists named using the \code{ID}s of the corresponding emulators in the final layer. Each
sub-list contains \emph{D} matrices, named \verb{output1, output2,..., outputD}, that correspond to the output
-dimensions of the GP/DGP emulator. Each matrix has its rows corresponding to testing positions and columns corresponding to samples
+dimensions of the GP/DGP emulator. Each matrix has rows corresponding to testing positions and columns corresponding to samples
of size: \code{B * sample_size}, where \code{B} is the number of imputations specified in \code{\link[=lgp]{lgp()}}.
\item if \code{method = "sampling"} and \code{full_layer = TRUE}: an updated \code{object} is returned with an additional slot called \code{results} that contains
\emph{L} (i.e., the number of layers of the emulated system) sub-lists named \verb{layer1, layer2,..., layerL}. Each sub-list represents a layer
-and contains \emph{K} number (same number of emulators in the corresponding layer of the system) of components named by the \code{ID}s of the corresponding emulators in that layer.
+and contains \emph{K} (same number of emulators in the corresponding layer of the system) components named using the \code{ID}s of the corresponding emulators in that layer.
Each component contains \emph{D} matrices, named \verb{output1, output2,..., outputD}, that correspond to
the output dimensions of the GP/DGP emulator. Each matrix has its rows corresponding to testing positions and columns corresponding to
samples of size: \code{B * sample_size}, where \code{B} is the number of imputations specified in \code{\link[=lgp]{lgp()}}.
@@ -192,11 +193,11 @@ The \code{results} slot will also include:
}
}
\description{
-This function implements single-core or multi-core predictions (with or without multi-threading)
+This function implements single-core or multi-core prediction (with or without multi-threading)
from GP, DGP, or linked (D)GP emulators.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/prune.Rd b/man/prune.Rd
index e7343fa..f9bfd98 100644
--- a/man/prune.Rd
+++ b/man/prune.Rd
@@ -7,31 +7,31 @@
prune(object, control = list(), verb = TRUE)
}
\arguments{
-\item{object}{an instance of the \code{dgp} class that is generated by \code{dgp()} with \code{struc = NULL}.}
+\item{object}{an instance of the \code{dgp} class that is generated by \code{dgp()}.}
-\item{control}{a list that can supply the following two components to control the static pruning of the DGP emulator:
+\item{control}{a list that can supply the following two components to control static pruning of the DGP emulator:
\itemize{
-\item \code{min_size}, the minimum number of design points required to trigger the pruning. Defaults to 10 times of the input dimensions.
-\item \code{threshold}, the R2 value above which a GP node is considered redundant and removable. Defaults to \code{0.97}.
+\item \code{min_size}, the minimum number of design points required to trigger pruning. Defaults to 10 times of the input dimensions.
+\item \code{threshold}, the \eqn{R^2} value above which a GP node is considered redundant and removable. Defaults to \code{0.97}.
}}
-\item{verb}{a bool indicating if the trace information will be printed during the function execution. Defaults to \code{TRUE}.}
+\item{verb}{a bool indicating if trace information will be printed during the function execution. Defaults to \code{TRUE}.}
}
\value{
An updated \code{object} that could be an instance of \code{gp}, \code{dgp}, or \code{bundle} (of GP emulators) class.
}
\description{
-This function implements the static pruning of a DGP emulator.
+This function implements static pruning for a DGP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
\item The function requires a DGP emulator that has been trained with a dataset comprising a minimum size equal to \code{min_size} in \code{control}.
-If the training dataset size is smaller than this, it is suggested to enrich the design of the DGP emulator and prune its
-structure dynamically using the \code{design()} function. Depending on the design of the DGP emulator, the static pruning may not be accurate.
-It is thus suggested to implement dynamic pruning as a part of the sequential design via \code{design()}.
+If the training dataset size is smaller than this, it is recommended that the design of the DGP emulator is enriched and its
+structure pruned dynamically using the \code{design()} function. Depending on the design of the DGP emulator, static pruning may not be accurate.
+It is thus recommended that dynamic pruning is implemented as a part of a sequential design via \code{design()}.
\item The following slots:
\itemize{
\item \code{loo} and \code{oos} created by \code{\link[=validate]{validate()}}; and
diff --git a/man/read.Rd b/man/read.Rd
index 103dd48..d7678a6 100644
--- a/man/read.Rd
+++ b/man/read.Rd
@@ -16,7 +16,7 @@ The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bun
This function loads the \code{.pkl} file that stores the emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/serialize.Rd b/man/serialize.Rd
index c81ce9a..b27b995 100644
--- a/man/serialize.Rd
+++ b/man/serialize.Rd
@@ -19,7 +19,7 @@ A serialized version of \code{object}.
This function serialize the constructed emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
Since the constructed emulators are 'python' objects, they cannot be directly exported to other R processes for parallel
diff --git a/man/set_id.Rd b/man/set_id.Rd
index 5d270d1..0c4a0f6 100644
--- a/man/set_id.Rd
+++ b/man/set_id.Rd
@@ -22,7 +22,7 @@ The updated \code{object}, with the assigned ID stored in its \code{id} slot.
This function assigns a unique identifier to an emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/set_imp.Rd b/man/set_imp.Rd
index 0c20128..555aca2 100644
--- a/man/set_imp.Rd
+++ b/man/set_imp.Rd
@@ -9,18 +9,16 @@ set_imp(object, B = 5)
\arguments{
\item{object}{an instance of the S3 class \code{dgp}.}
-\item{B}{the number of imputations to produce predictions from \code{object}. Increase the value to account for
-more imputation uncertainties with slower predictions. Decrease the value for lower imputation uncertainties
-but faster predictions. Defaults to \code{5}.}
+\item{B}{the number of imputations to produce predictions from \code{object}. Increase the value to improve imputation uncertainty quantification. Decrease the value to improve speed of prediction. Defaults to \code{5}.}
}
\value{
An updated \code{object} with the information of \code{B} incorporated.
}
\description{
-This function resets the number of imputations for predictions from a DGP emulator.
+This function resets the number of imputations for prediction from a DGP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
diff --git a/man/set_linked_idx.Rd b/man/set_linked_idx.Rd
index acb5461..996657f 100644
--- a/man/set_linked_idx.Rd
+++ b/man/set_linked_idx.Rd
@@ -22,7 +22,7 @@ This function is deprecated and will be removed in the next release. The updated
for (D)GP emulators.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
This function is useful when different models are emulated by different teams. Each team can create their (D)GP emulator
diff --git a/man/set_seed.Rd b/man/set_seed.Rd
index 2c0330d..05d92b3 100644
--- a/man/set_seed.Rd
+++ b/man/set_seed.Rd
@@ -17,7 +17,7 @@ This function initializes a random number generator that sets the random seed in
to ensure reproducible results from the package.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/set_thread_num.Rd b/man/set_thread_num.Rd
index 005cd5a..526a55c 100644
--- a/man/set_thread_num.Rd
+++ b/man/set_thread_num.Rd
@@ -20,5 +20,5 @@ This function sets the number of threads for parallel computations involved
in the package.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
diff --git a/man/set_vecchia.Rd b/man/set_vecchia.Rd
index 6e519a6..8e26f78 100644
--- a/man/set_vecchia.Rd
+++ b/man/set_vecchia.Rd
@@ -9,15 +9,15 @@ set_vecchia(object, vecchia = TRUE, M = 25, ord = NULL)
\arguments{
\item{object}{an instance of the S3 class \code{gp}, \code{dgp}, or \code{lgp}.}
-\item{vecchia}{a boolean or a list of booleans to indicate the addition or removal of the Vecchia approximation:
+\item{vecchia}{a bool or a list of bools to indicate the addition or removal of the Vecchia approximation:
\itemize{
-\item if \code{object} is an instance of the \code{gp} or \code{dgp} class, \code{vecchia} is a boolean that indicates
+\item if \code{object} is an instance of the \code{gp} or \code{dgp} class, \code{vecchia} is a bool that indicates
either addition (\code{vecchia = TRUE}) or removal (\code{vecchia = FALSE}) of the Vecchia approximation from \code{object}.
-\item if \code{object} is an instance of the \code{lgp} class, \code{x} can be a boolean or a list of booleans:
+\item if \code{object} is an instance of the \code{lgp} class, \code{x} can be a bool or a list of bools:
\itemize{
-\item if \code{vecchia} is a boolean, it indicates either addition (\code{vecchia = TRUE}) or removal (\code{vecchia = FALSE}) of
+\item if \code{vecchia} is a bool, it indicates either addition (\code{vecchia = TRUE}) or removal (\code{vecchia = FALSE}) of
the Vecchia approximation from all individual (D)GP emulators contained in \code{object}.
-\item if \code{vecchia} is a list of booleans, it should have same shape as \code{struc} that was supplied to \code{\link[=lgp]{lgp()}}. Each boolean
+\item if \code{vecchia} is a list of bools, it should have same shape as \code{struc} that was supplied to \code{\link[=lgp]{lgp()}}. Each bool
in the list indicates if the corresponding (D)GP emulator contained in \code{object} shall have the Vecchia approximation
added or removed.
}
@@ -46,7 +46,7 @@ This function adds or removes the Vecchia approximation from a GP, DGP or linked
constructed by \code{\link[=gp]{gp()}}, \code{\link[=dgp]{dgp()}} or \code{\link[=lgp]{lgp()}}.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
This function is useful for quickly switching between Vecchia and non-Vecchia approximations for an existing emulator
diff --git a/man/summary.Rd b/man/summary.Rd
index b112e3c..daa365e 100644
--- a/man/summary.Rd
+++ b/man/summary.Rd
@@ -46,7 +46,7 @@ This function provides a summary of key information for a GP, DGP, or linked (D)
by generating either a table or an interactive plot of the emulator’s structure.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/trace_plot.Rd b/man/trace_plot.Rd
index f7bd678..d3b6f0b 100644
--- a/man/trace_plot.Rd
+++ b/man/trace_plot.Rd
@@ -2,7 +2,7 @@
% Please edit documentation in R/utils.R
\name{trace_plot}
\alias{trace_plot}
-\title{Plot of DGP model parameter traces}
+\title{Trace plot for DGP hyperparameters}
\usage{
trace_plot(object, layer = NULL, node = 1)
}
@@ -18,11 +18,11 @@ corresponding layer.}
A \code{ggplot} object.
}
\description{
-This function plots the traces of model parameters of a chosen GP node
+This function draws trace plots for the hyperparameters of a chosen GP node
in a DGP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/unpack.Rd b/man/unpack.Rd
index 678f1e8..400a058 100644
--- a/man/unpack.Rd
+++ b/man/unpack.Rd
@@ -14,11 +14,11 @@ A named list that contains individual emulators (named \verb{emulator1,...,emula
where \code{S} is the number of emulators in \code{object}.
}
\description{
-This function unpacks a bundle of (D)GP emulators safely so any further manipulations of unpacked individual emulators
-will not impact the ones in the bundle.
+This function unpacks a bundle of (D)GP emulators safely so that any further manipulations of unpacked individual emulators
+will not impact those in the bundle.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\examples{
\dontrun{
diff --git a/man/update.Rd b/man/update.Rd
index c9908be..38b4451 100644
--- a/man/update.Rd
+++ b/man/update.Rd
@@ -31,7 +31,7 @@ update(object, X, Y, refit, reset, verb, ...)
\item the S3 class \code{dgp}.
}}
-\item{X}{the new input data which is a matrix where each row is an input training data point and each column is an input dimension.}
+\item{X}{the new input data which is a matrix where each row is an input training data point and each column represents an input dimension.}
\item{Y}{the new output data:
\itemize{
@@ -42,10 +42,10 @@ output dimensions. When \code{likelihood} (see below) is not \code{NULL}, \code{
\item{refit}{a bool indicating whether to re-fit the emulator \code{object} after the training input and output are updated. Defaults to \code{TRUE}.}
-\item{reset}{a bool indicating whether to reset hyperparameters of the emulator \code{object} to their initial values when the emulator was
-constructed, after the training input and output are updated. Defaults to \code{FALSE}.}
+\item{reset}{a bool indicating whether to reset hyperparameters of the emulator \code{object} to the initial values first obtained when the emulator was
+constructed. Use if it is suspected that a local mode for the hyperparameters has been reached through successive updates. Defaults to \code{FALSE}.}
-\item{verb}{a bool indicating if the trace information will be printed during the function execution.
+\item{verb}{a bool indicating if trace information will be printed during the function execution.
Defaults to \code{TRUE}.}
\item{...}{N/A.}
@@ -59,7 +59,7 @@ at each M-step during the re-fitting. If set to \code{NULL}, the number of proce
and \verb{max physical cores available \%/\% 2} if \code{vecchia = TRUE}. Only use multiple processes when there is a large number of GP components in different
layers and optimization of GP components is computationally expensive. Defaults to \code{1}.}
-\item{ess_burn}{number of burnin steps for the ESS-within-Gibbs at each I-step in training the emulator \code{object} if it is an
+\item{ess_burn}{number of burnin steps for the ESS-within-Gibbs sampler at each I-step of the training of the emulator \code{object} if it is an
instance of the \code{dgp} class. Defaults to \code{10}.}
\item{B}{the number of imputations for predictions from the updated emulator \code{object} if it is an instance of the \code{dgp} class.
@@ -73,7 +73,7 @@ An updated \code{object}.
This function updates the training input and output of a GP or DGP emulator with an option to refit the emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
diff --git a/man/validate.Rd b/man/validate.Rd
index a712b77..4d08e13 100644
--- a/man/validate.Rd
+++ b/man/validate.Rd
@@ -67,9 +67,9 @@ validate(
\item the S3 class \code{lgp}.
}}
-\item{x_test}{the OOS testing input data:
+\item{x_test}{OOS testing input data:
\itemize{
-\item if \code{object} is an instance of the \code{gp} or \code{dgp} class, \code{x_test} is a matrix where each row is an input testing data point and each column is an input dimension.
+\item if \code{object} is an instance of the \code{gp} or \code{dgp} class, \code{x_test} is a matrix where each row is a new input location to be used for validating the emulator and each column is an input dimension.
\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}} if \code{object} is an instance of the \code{lgp} class, \code{x_test} can be a matrix or a list:
\itemize{
\item if \code{x_test} is a matrix, it is the global testing input data that feed into the emulators in the first layer of a system.
@@ -92,42 +92,41 @@ The column indices in \code{x_test} must align with the indices specified in the
corresponding to rows where the \code{From_Emulator} column is \code{"Global"}.
}
-\code{x_test} must be provided for the validation if \code{object} is an instance of the \code{lgp}. Defaults to \code{NULL}.}
+\code{x_test} must be provided if \code{object} is an instance of the \code{lgp}. \code{x_test} must also be provided if \code{y_test} is provided. Defaults to \code{NULL}, in which case LOO validation is performed.}
-\item{y_test}{the OOS testing output data that correspond to \code{x_test}:
+\item{y_test}{the OOS output data corresponding to \code{x_test}:
\itemize{
-\item if \code{object} is an instance of the \code{gp} class, \code{y_test} is a matrix with only one column and each row being an testing output data point.
-\item if \code{object} is an instance of the \code{dgp} class, \code{y_test} is a matrix with its rows being testing output data points and columns being
-output dimensions.
+\item if \code{object} is an instance of the \code{gp} class, \code{y_test} is a matrix with only one column where each row represents the output corresponding to the matching row of \code{x_test}.
+\item if \code{object} is an instance of the \code{dgp} class, \code{y_test} is a matrix where each row represents the output corresponding to the matching row of \code{x_test} and with columns representing output dimensions.
\item if \code{object} is an instance of the \code{lgp} class, \code{y_test} can be a single matrix or a list of matrices:
\itemize{
-\item if \code{y_test} is a single matrix, then there is only one emulator in the final layer of the linked emulator system and \code{y_test}
+\item if \code{y_test} is a single matrix, then there should be only one emulator in the final layer of the linked emulator system and \code{y_test}
represents the emulator's output with rows being testing positions and columns being output dimensions.
-\item if \code{y_test} is a list, then \code{y_test} should have \emph{M} number (the same number of emulators in the final layer of the system) of matrices.
+\item if \code{y_test} is a list, then \code{y_test} should have \emph{L} matrices, where \emph{L} is the number of emulators in the final layer of the system.
Each matrix has its rows corresponding to testing positions and columns corresponding to output dimensions of the associated emulator
in the final layer.
}
}
-\code{y_test} must be provided for the validation if \code{object} is an instance of the \code{lgp}. Defaults to \code{NULL}.}
+\code{y_test} must be provided if \code{object} is an instance of the \code{lgp}. \code{y_test} must also be provided if \code{x_test} is provided. Defaults to \code{NULL}, in which case LOO validation is performed.}
-\item{method}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#updated}{\figure{lifecycle-updated.svg}{options: alt='[Updated]'}}}{\strong{[Updated]}} the prediction approach to use in validations: either the mean-variance approach (\code{"mean_var"}) or the sampling approach (\code{"sampling"}).
+\item{method}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#updated}{\figure{lifecycle-updated.svg}{options: alt='[Updated]'}}}{\strong{[Updated]}} the prediction approach to use for validation: either the mean-variance approach (\code{"mean_var"}) or the sampling approach (\code{"sampling"}). For details see \code{\link[=predict]{predict()}}.
For DGP emulators with a categorical likelihood (\code{likelihood = "Categorical"} in \code{\link[=dgp]{dgp()}}), only the sampling approach is supported.
By default, the method is set to \code{"sampling"} for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods and \code{"mean_var"} otherwise.}
\item{sample_size}{the number of samples to draw for each given imputation if \code{method = "sampling"}. Defaults to \code{50}.}
-\item{verb}{a bool indicating if the trace information on validations will be printed during the function execution.
+\item{verb}{a bool indicating if trace information for validation should be printed during function execution.
Defaults to \code{TRUE}.}
-\item{M}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} the size of the conditioning set for the Vecchia approximation in the emulator validation. This argument is only used if the emulator \code{object}
+\item{M}{\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} the size of the conditioning set for the Vecchia approximation in emulator validation. This argument is only used if the emulator \code{object}
was constructed under the Vecchia approximation. Defaults to \code{50}.}
-\item{force}{a bool indicating whether to force the LOO or OOS re-evaluation when \code{loo} or \code{oos} slot already exists in \code{object}. When \code{force = FALSE},
-\code{\link[=validate]{validate()}} will try to determine automatically if the LOO or OOS re-evaluation is needed. Set \code{force} to \code{TRUE} when LOO or OOS re-evaluation
+\item{force}{a bool indicating whether to force LOO or OOS re-evaluation when the \code{loo} or \code{oos} slot already exists in \code{object}. When \code{force = FALSE},
+\code{\link[=validate]{validate()}} will only re-evaluate the emulators if the \code{x_test} and \code{y_test} are not identical to the values in the \code{oos} slot. If the existing \code{loo} or \code{oos} validation used a different \code{M} in a Vecchia approximation or a different \code{method} to the one prescribed in this call, the emulator will be re-evaluated. Set \code{force} to \code{TRUE} when LOO or OOS re-evaluation
is required. Defaults to \code{FALSE}.}
-\item{cores}{the number of processes to be used for validations. If set to \code{NULL}, the number of processes is set to \verb{max physical cores available \%/\% 2}.
+\item{cores}{the number of processes to be used for validation. If set to \code{NULL}, the number of processes is set to \verb{max physical cores available \%/\% 2}.
Defaults to \code{1}.}
\item{...}{N/A.}
@@ -144,10 +143,10 @@ GP emulator at validation positions.
GP emulator at validation positions. If \code{method = "mean_var"}, the upper and lower bounds of a credible interval are two standard deviations above
and below the predictive mean. If \code{method = "sampling"}, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
\item a numeric value called \code{rmse} that contains the root mean/median squared error of the GP emulator.
-\item a numeric value called \code{nrmse} that contains the (min-max) normalized root mean/median squared error of the GP emulator. The min-max normalization
-is based on the maximum and minimum values of the validation outputs contained in \code{y_train} (or \code{y_test}).
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in the emulator validation.
-\item an integer called \code{sample_size} that contains the number of samples used for the validation if \code{method = "sampling"}.
+\item a numeric value called \code{nrmse} that contains the (max-min) normalized root mean/median squared error of the GP emulator. The max-min normalization
+uses the maximum and minimum values of the validation outputs contained in \code{y_train} (or \code{y_test}).
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains the size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.
+\item an integer called \code{sample_size} that contains the number of samples used for validation if \code{method = "sampling"}.
}
The rows of matrices (\code{mean}, \code{median}, \code{std}, \code{lower}, and \code{upper}) correspond to the validation positions.
@@ -162,10 +161,10 @@ DGP emulator at validation positions. If \code{method = "mean_var"}, the upper a
and below the predictive mean. If \code{method = "sampling"}, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
\item a vector called \code{rmse} that contains the root mean/median squared errors of the DGP emulator across different output
dimensions.
-\item a vector called \code{nrmse} that contains the (min-max) normalized root mean/median squared errors of the DGP emulator across different output
-dimensions. The min-max normalization is based on the maximum and minimum values of the validation outputs contained in \code{y_train} (or \code{y_test}).
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in the emulator validation.
-\item an integer called \code{sample_size} that contains the number of samples used for the validation if \code{method = "sampling"}.
+\item a vector called \code{nrmse} that contains the (max-min) normalized root mean/median squared errors of the DGP emulator across different output
+dimensions. The max-min normalization uses the maximum and minimum values of the validation outputs contained in \code{y_train} (or \code{y_test}).
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.
+\item an integer called \code{sample_size} that contains the number of samples used for validation if \code{method = "sampling"}.
}
The rows and columns of matrices (\code{mean}, \code{median}, \code{std}, \code{lower}, and \code{upper}) correspond to the validation positions and DGP emulator output
@@ -181,8 +180,8 @@ is a matrix that has its rows corresponding to validation positions and columns
\item a scalar called \code{log_loss} that represents the average log loss of the predicted labels in the DGP emulator across all validation positions. Log loss measures the
accuracy of probabilistic predictions, with lower values indicating better classification performance. \code{log_loss} ranges from \code{0} to positive infinity, where a
value closer to \code{0} suggests more confident and accurate predictions.
-\item an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in the emulator validation.
-\item an integer called \code{sample_size} that contains the number of samples used for the validation.
+\item an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.
+\item an integer called \code{sample_size} that contains the number of samples used for validation.
}
\item If \code{object} is an instance of the \code{lgp} class, an updated \code{object} is returned with an additional slot called \code{oos} (for OOS validation) that contains:
\itemize{
@@ -193,10 +192,10 @@ the linked (D)GP emulator at validation positions.
the linked (D)GP emulator at validation positions. If \code{method = "mean_var"}, the upper and lower bounds of a credible interval are two standard
deviations above and below the predictive mean. If \code{method = "sampling"}, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
\item a list called \code{rmse} that contains the root mean/median squared errors of the linked (D)GP emulator.
-\item a list called \code{nrmse} that contains the (min-max) normalized root mean/median squared errors of the linked (D)GP emulator. The min-max normalization
-is based on the maximum and minimum values of the validation outputs contained in \code{y_test}.
-\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in the emulator validation.
-\item an integer called \code{sample_size} that contains the number of samples used for the validation if \code{method = "sampling"}.
+\item a list called \code{nrmse} that contains the (max-min) normalized root mean/median squared errors of the linked (D)GP emulator. The max-min normalization
+uses the maximum and minimum values of the validation outputs contained in \code{y_test}.
+\item \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#new}{\figure{lifecycle-new.svg}{options: alt='[New]'}}}{\strong{[New]}} an integer called \code{M} that contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.
+\item an integer called \code{sample_size} that contains the number of samples used for validation if \code{method = "sampling"}.
}
Each element in \code{mean}, \code{median}, \code{std}, \code{lower}, \code{upper}, \code{rmse}, and \code{nrmse} corresponds to a (D)GP emulator in the final layer of the linked (D)GP
@@ -204,16 +203,15 @@ emulator.
}
}
\description{
-This function validate a constructed GP, DGP, or linked (D)GP emulator via the Leave-One-Out (LOO)
-cross validation or Out-Of-Sample (OOS) validation.
+This function calculates Leave-One-Out (LOO) cross validation or Out-Of-Sample (OOS) validation statistics for a constructed GP, DGP, or linked (D)GP emulator.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
-\item When both \code{x_test} and \code{y_test} are \code{NULL}, the LOO cross validation will be implemented. Otherwise, OOS validation will
-be implemented. The LOO validation is only applicable to a GP or DGP emulator (i.e., \code{object} is an instance of the \code{gp} or \code{dgp}
+\item When both \code{x_test} and \code{y_test} are \code{NULL}, LOO cross validation will be implemented. Otherwise, OOS validation will
+be implemented. LOO validation is only applicable to a GP or DGP emulator (i.e., \code{object} is an instance of the \code{gp} or \code{dgp}
class). If a linked (D)GP emulator (i.e., \code{object} is an instance of the \code{lgp} class) is provided, \code{x_test} and \code{y_test} must
also be provided for OOS validation.
}
diff --git a/man/vigf.Rd b/man/vigf.Rd
index 591759e..991274f 100644
--- a/man/vigf.Rd
+++ b/man/vigf.Rd
@@ -12,7 +12,7 @@ vigf(object, ...)
\method{vigf}{gp}(
object,
x_cand = NULL,
- n_start = 20,
+ n_start = 10,
batch_size = 1,
M = 50,
workers = 1,
@@ -24,7 +24,7 @@ vigf(object, ...)
\method{vigf}{dgp}(
object,
x_cand = NULL,
- n_start = 20,
+ n_start = 10,
batch_size = 1,
M = 50,
workers = 1,
@@ -37,7 +37,7 @@ vigf(object, ...)
\method{vigf}{bundle}(
object,
x_cand = NULL,
- n_start = 20,
+ n_start = 10,
batch_size = 1,
M = 50,
workers = 1,
@@ -58,13 +58,13 @@ vigf(object, ...)
\item{...}{any arguments (with names different from those of arguments used in \code{\link[=vigf]{vigf()}}) that are used by \code{aggregate}
can be passed here.}
-\item{x_cand}{a matrix (with each row containing a design point and column representing an input dimension) that gives a candidate set
-from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} could also
-be a list with length equal to the number of emulators contained in \code{object}. In this case, each slot in \code{x_cand} should be a candidate set matrix
-for each emulator included in the bundle. Defaults to \code{NULL}.}
+\item{x_cand}{a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
+from which the next design point(s) are determined. If \code{object} is an instance of the \code{bundle} class and \code{aggregate} is not supplied, \code{x_cand} can also be a list.
+The list must have a length equal to the number of emulators in \code{object}, with each element being a matrix representing the candidate set for a corresponding
+emulator in the bundle. Defaults to \code{NULL}.}
\item{n_start}{an integer that gives the number of initial design points to be used to determine next design point(s). This argument
-is only used when \code{x_cand} is \code{NULL}. Defaults to \code{20}.}
+is only used when \code{x_cand} is \code{NULL}. Defaults to \code{10}.}
\item{batch_size}{an integer that gives the number of design points to be chosen.
Defaults to \code{1}.}
@@ -95,32 +95,38 @@ of the matrix equals to:
\item the emulator output dimension if \code{object} is an instance of the \code{dgp} class; or
\item the number of emulators contained in \code{object} if \code{object} is an instance of the \code{bundle} class.
}
-\item the output should be a vector that gives aggregations of scores at different design points.
+\item the output should be a vector that gives aggregate scores at different design points.
}
-Set to \code{NULL} to disable the aggregation. Defaults to \code{NULL}.}
+Set to \code{NULL} to disable aggregation. Defaults to \code{NULL}.}
}
\value{
\enumerate{
-\item If \code{x_cand} is not \code{NULL} and:
+\item If \code{x_cand} is not \code{NULL}:
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a vector is returned with length equal to \code{batch_size}, giving the positions (i.e., row numbers)
-of next design points from \code{x_cand}.
-\item \code{object} is an instance of the \code{dgp} class, a vector is returned with length equal to \code{batch_size * D}, giving positions (i.e., row numbers)
-of next design points from \code{x_cand} to be added to the DGP emulator. \code{D} equals to the number of output dimensions of the DGP
-emulator if there is no likelihood layer in the hierarchy. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer,
-\code{D = 2}. If \code{object} is a DGP emulator with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a matrix is returned with row number equal to \code{batch_size} and column number equal to the number of
-emulators in the bundle, giving positions (i.e., row numbers) of next design points from \code{x_cand} to be added to individual emulators.
-}
-\item If \code{x_cand = NULL} and:
+\item When \code{object} is an instance of the \code{gp} class, a vector of length \code{batch_size} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand}.
+\item When \code{object} is an instance of the \code{dgp} class, a vector of length \code{batch_size * D} is returned, containing the positions
+(row numbers) of the next design points from \code{x_cand} to be added to the DGP emulator.
\itemize{
-\item \code{object} is an instance of the \code{gp} class, a matrix is returned with row number equal to \code{batch_size}, giving the next design points to be evaluated.
-\item \code{object} is an instance of the \code{dgp} class, a matrix is returned with row number equal to \code{batch_size * D} where \code{D} is the number of output dimensions of the DGP
-emulator if no likelihood layer is included. If \code{object} is a DGP emulator with either \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}. If \code{object} is a DGP emulator
-with a \code{Categorical} likelihood layer, \code{D} equals to one (for binary output) or \code{K} (for multi-class output with \code{K} classes).
-\item \code{object} is an instance of the \code{bundle} class, a list is returned with the length equal to the number of
-emulators in the bundle. Each element in the list is a matrix with row number equal to \code{batch_size}, giving next design points to be added to individual emulators.
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a matrix is returned with \code{batch_size} rows and a column for each emulator in
+the bundle, containing the positions (row numbers) of the next design points from \code{x_cand} for individual emulators.
+}
+\item If \code{x_cand} is \code{NULL}:
+\itemize{
+\item When \code{object} is an instance of the \code{gp} class, a matrix with \code{batch_size} rows is returned, giving the next design points to be evaluated.
+\item When \code{object} is an instance of the \code{dgp} class, a matrix with \code{batch_size * D} rows is returned, where:
+\itemize{
+\item \code{D} is the number of output dimensions of the DGP emulator if no likelihood layer is included.
+\item For a DGP emulator with a \code{Hetero} or \code{NegBin} likelihood layer, \code{D = 2}.
+\item For a DGP emulator with a \code{Categorical} likelihood layer, \code{D = 1} for binary output or \code{D = K} for multi-class output with \code{K} classes.
+}
+\item When \code{object} is an instance of the \code{bundle} class, a list is returned with a length equal to the number of emulators in the bundle. Each
+element of the list is a matrix with \code{batch_size} rows, where each row represents a design point to be added to the corresponding emulator.
}
}
}
@@ -129,11 +135,12 @@ This function searches from a candidate set to locate the next design point(s) t
or a bundle of (D)GP emulators using the Variance of Improvement for Global Fit (VIGF). For VIGF on GP emulators, see the reference below.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
-The column order of the first argument of \code{aggregate} must be consistent with the order of emulator output dimensions (if \code{object} is an instance of the
-\code{dgp} class), or the order of emulators placed in \code{object} if \code{object} is an instance of the \code{bundle} class.
+The first column of the matrix supplied to the first argument of \code{aggregate} must correspond to the first output dimension of the DGP emulator
+if \code{object} is an instance of the \code{dgp} class, and so on for subsequent columns and dimensions. If \code{object} is an instance of the \code{bundle} class,
+the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.
}
\examples{
\dontrun{
diff --git a/man/window.Rd b/man/window.Rd
index 71f5942..ac189a0 100644
--- a/man/window.Rd
+++ b/man/window.Rd
@@ -2,36 +2,36 @@
% Please edit documentation in R/utils.R
\name{window}
\alias{window}
-\title{Trim the sequences of model parameters of a DGP emulator}
+\title{Trim the sequence of hyperparameter estimates within a DGP emulator}
\usage{
window(object, start, end = NULL, thin = 1)
}
\arguments{
\item{object}{an instance of the S3 class \code{dgp}.}
-\item{start}{the first iteration before which all iterations are trimmed from the sequences.}
+\item{start}{the first iteration before which all iterations are trimmed from the sequence.}
-\item{end}{the last iteration after which all iterations are trimmed from the sequences.
+\item{end}{the last iteration after which all iterations are trimmed from the sequence.
Set to \code{NULL} to keep all iterations after (including) \code{start}. Defaults to \code{NULL}.}
-\item{thin}{the interval between the \code{start} and \code{end} iterations to thin out the sequences.
+\item{thin}{the interval between the \code{start} and \code{end} iterations to thin out the sequence.
Defaults to 1.}
}
\value{
-An updated \code{object} with trimmed sequences of model parameters.
+An updated \code{object} with a trimmed sequence of hyperparameters.
}
\description{
-This function trim the sequences of model parameters of a DGP emulator
-that are generated during the training.
+This function trims the sequence of hyperparameter estimates within a DGP emulator
+generated during training.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
\itemize{
\item This function is useful when a DGP emulator has been trained and one wants to trim
-the sequences of model parameters and use the trimmed sequences to generate the point estimates
-of DGP model parameters for predictions.
+the sequence of hyperparameters estimated and to use the trimmed sequence to generate point estimates
+of the DGP model parameters for prediction.
\item The following slots:
\itemize{
\item \code{loo} and \code{oos} created by \code{\link[=validate]{validate()}}; and
diff --git a/man/write.Rd b/man/write.Rd
index 8309892..016c974 100644
--- a/man/write.Rd
+++ b/man/write.Rd
@@ -12,23 +12,22 @@ write(object, pkl_file, light = TRUE)
\item{pkl_file}{the path to and the name of the \code{.pkl} file to which
the emulator \code{object} is saved.}
-\item{light}{a bool indicating if a light version of the constructed emulator (that requires a small storage) will be saved.
-Defaults to \code{TRUE}.}
+\item{light}{a bool indicating if a light version of the constructed emulator
+(that requires less disk space to store) will be saved. Defaults to \code{TRUE}.}
}
\value{
-No return value. \code{object} will be save to a local \code{.pkl} file specified by \code{pkl_file}.
+No return value. \code{object} will be saved to a local \code{.pkl} file specified by \code{pkl_file}.
}
\description{
This function saves the constructed emulator to a \code{.pkl} file.
}
\details{
-See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
+See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/dev/}.
}
\note{
-Since the constructed emulators are 'python' objects, \code{\link[=save]{save()}} from R will not work as it is only for R objects. If \code{object}
-was processed by \code{\link[=set_vecchia]{set_vecchia()}} to add or remove the Vecchia approximation, \code{light} needs to be set to \code{FALSE} to ensure
-reproducibility after the saved emulator is loaded by \code{\link[=read]{read()}}, since when \code{light = TRUE}, the imputations generated during
-emulator loading will be different.
+Since emulators built from the package are 'python' objects, \code{\link[=save]{save()}} from R will not work as it would for R objects. If \code{object}
+was processed by \code{\link[=set_vecchia]{set_vecchia()}} to add or remove the Vecchia approximation, \code{light} should be set to \code{FALSE} to ensure
+reproducibility after the saved emulator is reloaded by \code{\link[=read]{read()}}.
}
\examples{
\dontrun{
diff --git a/vignettes/classification.Rmd b/vignettes/classification.Rmd
index 62a260c..847ad00 100644
--- a/vignettes/classification.Rmd
+++ b/vignettes/classification.Rmd
@@ -6,7 +6,7 @@ bibliography: references.bib
description: >
DGP classification of the iris data set.
vignette: >
- %\VignetteIndexEntry{DGP Classification using Stochastic Imputation}
+ %\VignetteIndexEntry{DGP Classification using dgpsi}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
@@ -35,7 +35,7 @@ We now load the iris data set,
data(iris)
```
-and rescale its four input variables to $[0,1]$.
+and re-scale its four input variables to $[0,1]$.
```{r}
iris <- iris %>%
@@ -79,13 +79,15 @@ m_dgp <- dgp(X_train, Y_train, depth = 3, name = c('matern2.5', 'sexp'), likelih
## Imputing ... done
```
-Visualising the DGP object helps to clarify the layered structure for non-Gaussian (in this case categorical) likelihoods.
+Visualizing the DGP object helps to clarify the layered structure for non-Gaussian (in this case categorical) likelihoods.
+
```{r}
summary(m_dgp)
```
-![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/categorical_summary.png){width=100%}
-After the global inputs, the 3-layered DGP is comprised of 2 hidden layers containing GPs, and a "likelihood layer" that transforms each of the preceding GP nodes to one of the parameters required by the likelihood function. In this example, we have 3 possible categories and we use a softmax link function, so there are 3 parameters to set and the final layer of GPs has 3 nodes, one for each of them.
+
+
+After the global inputs, the 3-layered DGP is comprised of 2 hidden layers containing GPs, and a "likelihood layer" that transforms each of the preceding GP nodes to one of the parameters required by the likelihood function. In this example, we have 3 possible categories and we use a softmax link function, so there are 3 parameters to set and the second layer has 3 GP nodes, one for each of them.
## Validation
diff --git a/vignettes/dgpsi.Rmd b/vignettes/dgpsi.Rmd
index f78f3df..f55b6b5 100644
--- a/vignettes/dgpsi.Rmd
+++ b/vignettes/dgpsi.Rmd
@@ -14,9 +14,23 @@ knitr::opts_chunk$set(
echo = TRUE,
eval = FALSE
)
+
+get_article_url <- function(article) {
+ pkg_version <- as.character(utils::packageVersion("dgpsi"))
+
+ is_dev <- grepl("\\.9000$", pkg_version)
+
+ base_url <- if (is_dev) {
+ "https://mingdeyu.github.io/dgpsi-R/dev"
+ } else {
+ "https://mingdeyu.github.io/dgpsi-R"
+ }
+
+ paste0(base_url, article)
+}
```
-`dgpsi` provides a flexible toolbox for Gaussian process (GP), deep Gaussian process (DGP) and linked (D)GP emulation. In this guide, we show how to use the package to emulate a step function with a three-layered DGP structure. There are other examples showing the functionality of the package in [`Articles`](https://mingdeyu.github.io/dgpsi-R/dev/articles/index.html) on the package website, including DGP customization, scalable DGPs, DGPs for classification and other non-Gaussian problems and sequential design/reinforcement learning for DGPs. A comprehensive reference of available functions is documented in [`Reference`](https://mingdeyu.github.io/dgpsi-R/dev/reference/index.html) section of the package website.
+The `dgpsi` package offers a flexible toolbox for Gaussian process (GP), deep Gaussian process (DGP), and linked (D)GP emulation. In this guide, we show how to use the package to emulate a step function using a three-layered DGP structure. Additional examples showcasing the package's functionality are available in the [Articles](`r get_article_url("/articles/index.html")`) section of the package website. Topics include [linked DGPs](`r get_article_url("/articles/linked_DGP.html")`), [scalable DGPs](`r get_article_url("/articles/large_scale_emulation.html")`), [DGPs for classification](`r get_article_url("/articles/classification.html")`), [non-Gaussian problems](`r get_article_url("/articles/motorcycle.html")`), and [sequential design and active learning for DGPs](`r get_article_url("/articles/seq_design.html")`). A detailed reference for all available functions is provided in the [Reference](`r get_article_url("/reference/index.html")`) section of the package website.
## Load the package
@@ -114,7 +128,8 @@ plot(m,oos_x,oos_y)
```
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/step_fct_oos.png){width=100%}
-Note that the `style` argument to the ploting function can be used to draw different types of plot (see the documentation).
+
+Note that the `style` argument to the `plot()` function can be used to draw different types of plot (see `?plot`).
## Prediction
diff --git a/vignettes/images/categorical_summary.png b/vignettes/images/categorical_summary.png
deleted file mode 100644
index 581c244..0000000
Binary files a/vignettes/images/categorical_summary.png and /dev/null differ
diff --git a/vignettes/images/motorcycle_data.png b/vignettes/images/motorcycle_data.png
index 5b65ef0..770c3a5 100644
Binary files a/vignettes/images/motorcycle_data.png and b/vignettes/images/motorcycle_data.png differ
diff --git a/vignettes/images/seq2_design.png b/vignettes/images/seq2_design.png
index 195e8b0..214f645 100644
Binary files a/vignettes/images/seq2_design.png and b/vignettes/images/seq2_design.png differ
diff --git a/vignettes/images/seq2_rmse.png b/vignettes/images/seq2_rmse.png
index 17d7813..db32f76 100644
Binary files a/vignettes/images/seq2_rmse.png and b/vignettes/images/seq2_rmse.png differ
diff --git a/vignettes/images/seq_comparison.png b/vignettes/images/seq_comparison.png
index 965ac0c..fb0aa32 100644
Binary files a/vignettes/images/seq_comparison.png and b/vignettes/images/seq_comparison.png differ
diff --git a/vignettes/images/seq_design.png b/vignettes/images/seq_design.png
index 0885a5a..28d504b 100644
Binary files a/vignettes/images/seq_design.png and b/vignettes/images/seq_design.png differ
diff --git a/vignettes/images/seq_rmse.png b/vignettes/images/seq_rmse.png
index f567c24..8085020 100644
Binary files a/vignettes/images/seq_rmse.png and b/vignettes/images/seq_rmse.png differ
diff --git a/vignettes/large_scale_emulation.Rmd b/vignettes/large_scale_emulation.Rmd
index 1079c6d..ad13f8a 100644
--- a/vignettes/large_scale_emulation.Rmd
+++ b/vignettes/large_scale_emulation.Rmd
@@ -1,6 +1,6 @@
---
title: >
- Large-scale Emulation with the Vecchia approximation
+ Large-scale Emulation with the Vecchia Approximation
output: rmarkdown::html_vignette
bibliography: references.bib
description: >
@@ -101,13 +101,7 @@ plot(m, oos_x, oos_y)
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/vecchia_oos.png){width=100%}
-Note that `gp` can also be used in Vecchia mode with
-
-```{r}
-m1 <- gp(X, Y, vecchia = TRUE)
-```
-
-For this problem, we found that the Vecchia GP had half the NRMSE of the DGP but a few more points outside of the confidence intervals.
+Note that `gp()` can also operate in Vecchia mode. For this problem, using `m_gp <- gp(X, Y, vecchia = TRUE)`, we found that the GP emulator `m_gp` achieved half the NRMSE of the DGP emulator `m`, but it had a few more points outside the credible intervals.
### Performance tip
diff --git a/vignettes/linked_DGP.Rmd b/vignettes/linked_DGP.Rmd
index 84817e4..fe38d1b 100644
--- a/vignettes/linked_DGP.Rmd
+++ b/vignettes/linked_DGP.Rmd
@@ -200,7 +200,6 @@ m_link <- lgp(struc, emulators, activate = FALSE)
```
Processing emulators ... done
Linking and synchronizing emulators ... done
-Validating the linked emulator ... done
```
and visually check the relationships between emulators by applying `summary()` to `m_link`:
diff --git a/vignettes/motorcycle.Rmd b/vignettes/motorcycle.Rmd
index 8cefcf1..8d9dbd6 100644
--- a/vignettes/motorcycle.Rmd
+++ b/vignettes/motorcycle.Rmd
@@ -26,6 +26,7 @@ We start by loading packages:
```{r}
library(dgpsi)
library(MASS)
+library(ggplot2)
library(patchwork)
```
@@ -46,7 +47,11 @@ Y <- scale(Y, center = TRUE, scale = TRUE)
and plot them:
```{r}
-plot(X, Y, pch = 16, cex = 1, xlab = 'Time', ylab = 'Acceleration', cex.axis = 1.3, cex.lab = 1.3)
+ggplot(data = data.frame(X = X, Y = Y), aes(x = X, y = Y)) +
+ geom_point(shape = 16, size = 3) +
+ labs(x = "Time", y = "Acceleration") +
+ theme(axis.title = element_text(size = 13),
+ axis.text = element_text(size = 13))
```
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/motorcycle_data.png){width=100%}
@@ -135,7 +140,7 @@ m_gp <- validate(m_gp, test_x, test_y)
## Saving results to the slot 'oos' in the gp object ... done
```
-Note that using `validate()` before plotting can save subsequent computations compared to simply invoking `plot()`, as `validate()` stores validation results in the emulator objects and plot will use these if it can to avoid calculating them on the fly. Finally, we plot the OOS validation for the GP emulator:
+Note that using `validate()` before plotting can save subsequent computations compared to simply invoking `plot()`, as `validate()` stores validation results in the emulator objects and `plot()` will use these, if it can, to avoid calculating them on the fly. Finally, we plot the OOS validation for the GP emulator:
```{r}
plot(m_gp, test_x, test_y)
diff --git a/vignettes/seq_design.Rmd b/vignettes/seq_design.Rmd
index 3cdacd6..eca4a65 100644
--- a/vignettes/seq_design.Rmd
+++ b/vignettes/seq_design.Rmd
@@ -15,6 +15,20 @@ knitr::opts_chunk$set(
echo = TRUE,
eval = FALSE
)
+
+get_article_url <- function(article) {
+ pkg_version <- as.character(utils::packageVersion("dgpsi"))
+
+ is_dev <- grepl("\\.9000$", pkg_version)
+
+ base_url <- if (is_dev) {
+ "https://mingdeyu.github.io/dgpsi-R/dev"
+ } else {
+ "https://mingdeyu.github.io/dgpsi-R"
+ }
+
+ paste0(base_url, article)
+}
```
This vignette shows how to use `dgpsi` to sequentially enrich a design for adaptive improvement and pruning of an emulator. We choose a DGP for this example, but the methods work equally well on all types of emulator.
@@ -51,7 +65,7 @@ ggplot(dat, aes(x1, x2, fill = f)) + geom_tile() +
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/seq_fct.png){width=100%}
-We can see from the figure above that the synthetic simulator exhibits more fluctuations on the bottom left of its input space whil,e in the top-right part, the simulator shows little variation.
+We can see from the figure above that the synthetic simulator exhibits more fluctuations on the bottom left of its input space while in the top-right part, the simulator shows little variation.
We now specify a seed with `set_seed()` for reproducibility
@@ -89,7 +103,7 @@ m <- dgp(X, Y)
## Imputing ... done
```
-We then specify the boundaries of the input parameter space for `f` so that the sequential design can restrict it's search for new points.
+We then specify the boundaries of the input parameter space for `f` so that the sequential design can restrict its search for new points.
```{r}
lim_1 <- c(0, 1)
@@ -111,30 +125,30 @@ m <- design(m, N = 25, limits = lim, f = f, x_test = validate_x, y_test = valida
## * RMSE: 0.527337
## Iteration 1:
## - Locating ... done
-## * Next design point: 0.868213 0.037359
+## * Next design point: 0.930278 0.033542
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.541914
+## * RMSE: 0.532428
##
## ...
##
## Iteration 18:
## - Locating ... done
-## * Next design point: 0.341178 0.003477
+## * Next design point: 0.706449 0.048681
## - Updating and re-fitting ... done
## - Pruning 1 node(s) in layer 1 ... done
## - Re-fitting ... done
## - Validating ... done
-## * RMSE: 0.107640
+## * RMSE: 0.139819
##
## ...
##
## Iteration 25:
## - Locating ... done
-## * Next design point: 0.168364 0.984091
+## * Next design point: 0.000010 0.483511
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.032189
+## * RMSE: 0.071370
```
After the first wave we see that 1 GP node is removed from the first layer by automatic pruning of the DGP which leaves only one node in both the first and second layer of the DGP hierarchy respectively. This helps accelerate the inference of the DGP emulator in subsequent waves of the sequential design while maintaining accuracy. We now start the second wave of the sequential design:
@@ -145,22 +159,22 @@ m <- design(m, N = 10, limits = lim, f = f, x_test = validate_x, y_test = valida
```
```
## Initializing ... done
-## * RMSE: 0.032189
+## * RMSE: 0.071370
## Iteration 1:
## - Locating ... done
-## * Next design point: 0.215369 0.384567
+## * Next design point: 0.461239 0.415336
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.017259
+## * RMSE: 0.056827
##
## ...
##
## Iteration 10:
## - Locating ... done
-## * Next design point: 0.198905 0.873454
+## * Next design point: 0.177935 0.459387
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.008300
+## * RMSE: 0.011508
```
Finally, we resume the second wave with 10 additional iterations:
@@ -172,22 +186,22 @@ m <- design(m, N = 10, limits = lim, f = f, x_test = validate_x, y_test = valida
```
## Iteration 11:
## - Locating ... done
-## * Next design point: 0.977752 0.011568
+## * Next design point: 0.029874 0.763962
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.007264
+## * RMSE: 0.014444
##
## ...
##
## Iteration 20:
## - Locating ... done
-## * Next design point: 0.258375 0.028753
+## * Next design point: 0.829788 0.452129
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.003784
+## * RMSE: 0.004783
```
-Resuming rather than adding an additional wave can be a useful feature, particularly for plotting using `draw`. After the sequential design is done, we can inspect the enriched design by applying `draw()` to `m`:
+Resuming rather than adding an additional wave can be a useful feature, particularly for plotting using `draw()`. After the sequential design is done, we can inspect the enriched design by applying `draw()` to `m`:
```{r}
draw(m, 'design')
@@ -277,7 +291,7 @@ It can be seen from the plot above that with static space-filling designs, the q
### See also
-See [`Sequential Design II`](https://mingdeyu.github.io/dgpsi-R/dev/articles/seq_design_2.html) for the sequential design of a bundle of DGP emulators with automatic terminations.
+See [Sequential Design II](`r get_article_url("/articles/seq_design_2.html")`) for the sequential design of a bundle of DGP emulators with automatic terminations.
### References
diff --git a/vignettes/seq_design_2.Rmd b/vignettes/seq_design_2.Rmd
index df26bf9..c7c2217 100644
--- a/vignettes/seq_design_2.Rmd
+++ b/vignettes/seq_design_2.Rmd
@@ -14,6 +14,20 @@ knitr::opts_chunk$set(
echo = TRUE,
eval = FALSE
)
+
+get_article_url <- function(article) {
+ pkg_version <- as.character(utils::packageVersion("dgpsi"))
+
+ is_dev <- grepl("\\.9000$", pkg_version)
+
+ base_url <- if (is_dev) {
+ "https://mingdeyu.github.io/dgpsi-R/dev"
+ } else {
+ "https://mingdeyu.github.io/dgpsi-R"
+ }
+
+ paste0(base_url, article)
+}
```
This vignette shows how to use the package to sequentially refine a bundle of DGP emulators, each of which emulates an output of a simulator.
@@ -40,7 +54,7 @@ f <- function(x) {
}
```
-Note that the function is defined in such a way that both its input, `x`, and output are matrices. The following figure shows the true functional forms of the three outputs of the simulator over `[0, 1]`:
+Note that the function is defined in such a way that both its input, `x`, and output are matrices. The following figure shows the true functional forms of the three outputs of the simulator over $[0, 1]$:
```{r}
dense_x <- seq(0, 1, length = 200)
@@ -59,7 +73,7 @@ wrap_plots(list(p1, p2, p3)) + plot_annotation(title = 'Synthetic Simulator')
We now specify a seed with `set_seed()` from the package for reproducibility
```{r}
-set_seed(99)
+set_seed(9999)
```
and generate an initial design with 5 design points using a maximin Latin hypercube sampler:
@@ -147,38 +161,32 @@ m <- design(m, N = 10, limits = lim, f = f, x_test = validate_x, y_test = valida
```
## Initializing ... done
-## * RMSE: 0.340553 0.158352 0.019125
+## * RMSE: 0.383722, RMSE: 0.154689, RMSE: 0.008984
## Iteration 1:
## - Locating ... done
-## * Next design point (Emulator1): 0.467905
-## * Next design point (Emulator2): 0.003782
-## * Next design point (Emulator3): 0.997096
-## - Updating and re-fitting ... done
-## - Validating ... done
-## * RMSE: 0.309701 0.158634 0.006056
-## Iteration 2:
-## - Locating ... done
-## * Next design point (Emulator1): 0.997773
-## * Next design point (Emulator2): 0.997773
+## * Next design point (Emulator1): 0.207895
+## * Next design point (Emulator2): 0.235873
## * Next design point (Emulator3): None (target reached)
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.291350 0.159032 0.006056
+## * RMSE: 0.337517, RMSE: 0.155087, RMSE: 0.008984
##
## ...
##
## Iteration 10:
## - Locating ... done
-## * Next design point (Emulator1): 0.198829
-## * Next design point (Emulator2): 0.311879
+## * Next design point (Emulator1): 0.430889
+## * Next design point (Emulator2): 0.479682
## * Next design point (Emulator3): None (target reached)
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.190767 0.014075 0.006056
-## Targets are not reached for all emulators at the end of the sequential design.
+## * RMSE: 0.120320, RMSE: 0.062365, RMSE: 0.008984
+## Targets not reached for all emulators at the end of the sequential design.
```
-It can be seen that at the second step, the DGP emulator for the third output has already reached the target, so for the rest of the steps no further refinements (i.e., additions of design points to the third DGP emulator) are performed. At the end of the first wave, the DGP emulators for both the first and second outputs have not reached the target yet. At this point, we can proceed to a second wave by repeating the command above, but we show below an alternative way, in which we define an aggregation function that aggregates criterion scores across the three outputs such that the same design points are added to the three emulators at each step (instead of different design points for each emulator). We define the aggregation function `g` that aggregate scores by calculating their weighted average:
+It can be seen that at the first step, the DGP emulator for the third output has already reached the target, so no further refinements (i.e., additions of design points to the third DGP emulator) are performed for the remaining steps. By the end of the first wave, the DGP emulators for the first and second outputs have not yet reached the target. At this point, we can proceed to a second wave by repeating the command above. However, we demonstrate an alternative approach below, where we define an aggregation function (applicable to all built-in `method` functions of `design()`). This function aggregates criterion scores across the three outputs, ensuring that the same design points are added to all three emulators at each step, instead of selecting different design points for each emulator. Using the aggregation approach can be advantageous if the different outputs exhibit similar behavior with respect to the input, as it reduces number of simulations required at each iteration. However, if the outputs behave differently, it may be more effective to add distinct design points to each emulator to achieve lower errors more quickly.
+
+We define the aggregation function `g` to compute a weighted average of the scores:
```{r}
g <- function(x, weight){
@@ -189,10 +197,10 @@ g <- function(x, weight){
}
```
-Since the third emulator has already reached the target, we assign zero weights to it and weights of 0.8 and 0.2 to the first and second emulators respectively:
+Since the third emulator has already reached the target, we assign zero weights to it and weights of 0.6 and 0.4 to the first and second emulators respectively:
```{r}
-weight <- c(0.8, 0.2, 0)
+weight <- c(0.6, 0.4, 0)
```
We now pass both the aggregate function, `g()`, and its `weight` argument to `design()` for a second wave of the sequential design with a further 15 steps:
@@ -203,38 +211,41 @@ m <- design(m, N = 15, limits = lim, f = f, x_test = validate_x, y_test = valida
```
```
## Initializing ... done
-## * RMSE: 0.190767 0.014075 0.006056
+## * RMSE: 0.120320, RMSE: 0.062365, RMSE: 0.008984
## Iteration 1:
## - Locating ... done
-## * Next design point (Emulator1): 0.263892
-## * Next design point (Emulator2): 0.263892
+## * Next design point (Emulator1): 0.062821
+## * Next design point (Emulator2): 0.062821
## * Next design point (Emulator3): None (target reached)
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.176961 0.009467 0.006056
-## Iteration 2:
+## * RMSE: 0.151946, RMSE: 0.061865, RMSE: 0.008984
+##
+## ...
+##
+## Iteration 6:
## - Locating ... done
-## * Next design point (Emulator1): 0.030840
+## * Next design point (Emulator1): 0.233155
## * Next design point (Emulator2): None (target reached)
## * Next design point (Emulator3): None (target reached)
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.203489 0.009467 0.006056
+## * RMSE: 0.014147, RMSE: 0.004410, RMSE: 0.008984
##
## ...
##
-## Iteration 10:
+## Iteration 8:
## - Locating ... done
-## * Next design point (Emulator1): 0.788265
+## * Next design point (Emulator1): 0.009688
## * Next design point (Emulator2): None (target reached)
## * Next design point (Emulator3): None (target reached)
## - Updating and re-fitting ... done
## - Validating ... done
-## * RMSE: 0.009116 0.009467 0.006056
-## Target reached! The sequential design stops at step 10.
+## * RMSE: 0.005913, RMSE: 0.004410, RMSE: 0.008984
+Target reached! Sequential design stopped at step 8.
```
-The first and the second emulators reached the target after iteration 9 and 1 of the second wave, respectively. The sequential design points of the three emulators can be plotted with `draw()`:
+The first and the second emulators reached the target after iteration 8 and 5 of the second wave, respectively. The sequential design points of the three emulators can be plotted with `draw()`:
```{r}
draw(m, 'design')
@@ -242,7 +253,7 @@ draw(m, 'design')
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/seq2_design.png){width=100%}
-The figure above shows that for the first emulator most of the design points are added below 0.5 whilst for the second emulator most of the design points concentrate around 0.5. For the third emulator, the resulting design is space-filling. It can be seen that these design point distributions are consistent with the functional complexities of the three outputs.
+The figure above shows that, for the first emulator, most of the design points are added below 0.5, while for the second emulator, the design points are concentrated around 0.5. For the third emulator, the resulting design is space-filling. These design point distributions align with the functional complexities of the three outputs. However, in the second wave, which uses the aggregation function, additional points are added below 0.5 for the second emulator due to the higher weight assigned to the first emulator. These points may not be necessary for the second output, as its functional behavior does not require further refinement in that region. This observation aligns with the earlier argument that using the aggregation function to add the same design points to outputs with differing behaviors may not always be effective.
## Comparison to DGP emulators with space-filling designs
@@ -313,8 +324,8 @@ p
![](https://raw.githubusercontent.com/mingdeyu/dgpsi-R/master/vignettes/images/seq2_rmse.png){width=100%}
-It can be seen from the plot above that sequential design is more efficient than batch space filling design, achieving similar RMSE with far fewer design points.
+It can be seen from the plot above that sequential design is more efficient than batch space-filling design, achieving similar RMSE with fat fewer design points, particularly for the first emulator in the bundle.
### See also
-See [`Sequential Design I`](https://mingdeyu.github.io/dgpsi-R/dev/articles/seq_design.html) for the sequential design and automatic structure simplification of a DGP emulator on a 2D simulator.
+See [Sequential Design I](`r get_article_url("/articles/seq_design.html")`) for the sequential design and automatic structure simplification of a DGP emulator on a 2D simulator.