This function builds and trains a GP emulator.
Usage
gp(
X,
Y,
name = "sexp",
lengthscale = rep(0.1, ncol(X)),
bounds = NULL,
prior = "ref",
nugget_est = FALSE,
nugget = ifelse(nugget_est, 0.01, 1e-08),
scale_est = TRUE,
scale = 1,
training = TRUE,
verb = TRUE,
vecchia = FALSE,
M = 25,
ord = NULL,
internal_input_idx = NULL,
linked_idx = NULL,
id = NULL
)
Arguments
- X
a matrix where each row is an input data point and each column is an input dimension.
- Y
a matrix with only one column and each row being an output data point.
- name
kernel function to be used. Either
"sexp"
for squared exponential kernel or"matern2.5"
for Matérn-2.5 kernel. Defaults to"sexp"
.- lengthscale
initial values of lengthscales in the kernel function. It can be a single numeric value or a vector of length
ncol(X)
:if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;
if it is a vector, it is assumed that kernel functions across input dimensions have different lengthscales.
Defaults to a vector of
0.1
.- bounds
the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults to
NULL
where no bounds are specified for the lengthscales.- prior
prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior (
"ga"
), inverse gamma prior ("inv_ga"
), or jointly robust prior ("ref"
). Defaults to"ref"
. See the reference below for the jointly robust prior.- nugget_est
a bool indicating if the nugget term is to be estimated:
FALSE
: the nugget term is fixed tonugget
.TRUE
: the nugget term will be estimated.
Defaults to
FALSE
.- nugget
the initial nugget value. If
nugget_est = FALSE
, the assigned value is fixed during the training. Setnugget
to a small value (e.g.,1e-8
) and the corresponding bool innugget_est
toFALSE
for deterministic computer models where the emulator should interpolate the training data points. Setnugget
to a larger value and the corresponding bool innugget_est
toTRUE
for stochastic emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to1e-8
ifnugget_est = FALSE
and0.01
ifnugget_est = TRUE
.- scale_est
a bool indicating if the variance is to be estimated:
FALSE
: the variance is fixed toscale
.TRUE
: the variance term will be estimated.
Defaults to
TRUE
.- scale
the initial variance value. If
scale_est = FALSE
, the assigned value is fixed during the training. Defaults to1
.- training
a bool indicating if the initialized GP emulator will be trained. When set to
FALSE
,gp()
returns an untrained GP emulator, to which one can applysummary()
to inspect its specification or applypredict()
to check its emulation performance before the training. Defaults toTRUE
.- verb
a bool indicating if the trace information on GP emulator construction and training will be printed during function execution. Defaults to
TRUE
.- vecchia
a bool indicating whether to use Vecchia approximation for large-scale GP emulator construction and prediction. Defaults to
FALSE
. The Vecchia approximation implemented for the GP emulation largely follows Katzfuss et al. (2022). See reference below.- M
the size of the conditioning set for the Vecchia approximation in the GP emulator training. Defaults to
25
.- ord
an R function that returns the ordering of the input to the GP emulator for the Vecchia approximation. The function must satisfy the following basic rules:
the first argument represents the input scaled by the lengthscales.
the output of the function is a vector of indices that gives the ordering of the input to the GP emulator.
If
ord = NULL
, the default random ordering is used. Defaults toNULL
.- internal_input_idx
The column indices of
X
that are generated by the linked emulators in the preceding layers. Setinternal_input_idx = NULL
if the GP emulator is in the first layer of a system or all columns inX
are generated by the linked emulators in the preceding layers. Defaults toNULL
.The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated
lgp()
function instead.- linked_idx
Either a vector or a list of vectors:
If
linked_idx
is a vector, it gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators in the feeding layer) that feed into the GP emulator. The length of the vector shall equal to the length ofinternal_input_idx
wheninternal_input_idx
is notNULL
. If the GP emulator is in the first layer of a linked emulator system, the vector gives the column indices of the global input (formed by column-combining all input matrices of emulators in the first layer) that the GP emulator will use. If the GP emulator is to be used in both the first and subsequent layers, one should initially setlinked_idx
to the appropriate values for the situation where the emulator is not in the first layer. Then, use the functionset_linked_idx()
to reset the linking information when the emulator is in the first layer.When the GP emulator is not in the first layer of a linked emulator system,
linked_idx
can be a list that gives the information on connections between the GP emulator and emulators in all preceding layers. The length of the list should equal to the number of layers before the GP emulator. Each element of the list is a vector that gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators) in the corresponding layer that feed into the GP emulator. If the GP emulator has no connections to any emulator in a certain layer, setNULL
in the corresponding position of the list. The order of input dimensions inX[,internal_input_idx]
should be consistent withlinked_idx
. For example, a GP emulator in the second layer that is fed by the output dimension 1 and 3 of emulators in layer 1 should havelinked_idx = list( c(1,3) )
. In addition, the first and second columns ofX[,internal_input_idx]
should correspond to the output dimensions 1 and 3 from layer 1.
Set
linked_idx = NULL
if the GP emulator will not be used for linked emulations. However, if this is no longer the case, one can useset_linked_idx()
to add linking information to the GP emulator. Defaults toNULL
.The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated
lgp()
function instead.- id
an ID to be assigned to the GP emulator. If an ID is not provided (i.e.,
id = NULL
), a UUID (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Default toNULL
.
Value
An S3 class named gp
that contains five slots:
id
: A number or character string assigned through theid
argument.data
: a list that contains two elements:X
andY
which are the training input and output data respectively.specs
: a list that contains seven elements:kernel
: the type of the kernel function used. Either"sexp"
for squared exponential kernel or"matern2.5"
for Matérn-2.5 kernel.lengthscales
: a vector of lengthscales in the kernel function.scale
: the variance value in the kernel function.nugget
: the nugget value in the kernel function.internal_dims
: the column indices ofX
that correspond to the linked emulators in the preceding layers of a linked system. The slot will be removed in the next release.external_dims
: the column indices ofX
that correspond to global inputs to the linked system of emulators. It is shown asFALSE
ifinternal_input_idx = NULL
. The slot will be removed in the next release.linked_idx
: the value passed to argumentlinked_idx
. It is shown asFALSE
if the argumentlinked_idx
isNULL
. The slot will be removed in the next release.vecchia
: whether the Vecchia approximation is used for the GP emulator training.M
: the size of the conditioning set for the Vecchia approximation in the GP emulator training.
constructor_obj
: a 'python' object that stores the information of the constructed GP emulator.container_obj
: a 'python' object that stores the information for the linked emulation.emulator_obj
: a 'python' object that stores the information for the predictions from the GP emulator.
The returned gp
object can be used by
predict()
for GP predictions.validate()
for LOO and OOS validations.plot()
for validation plots.lgp()
for linked (D)GP emulator constructions.summary()
to summarize the trained GP emulator.write()
to save the GP emulator to a.pkl
file.design()
for sequential designs.update()
to update the GP emulator with new inputs and outputs.
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/dev/.
Note
Any R vector detected in X
and Y
will be treated as a column vector and automatically converted into a single-column
R matrix. Thus, if X
is a single data point with multiple dimensions, it must be given as a matrix.
References
Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. Bayesian Analysis, 14(3), 857-885.
Katzfuss, M., Guinness, J., & Lawrence, E. (2022). Scaled Vecchia approximation for fast computer-model emulation. SIAM/ASA Journal on Uncertainty Quantification, 10(2), 537-554.
Examples
if (FALSE) { # \dontrun{
# load the package and the Python env
library(dgpsi)
# construct a step function
f <- function(x) {
if (x < 0.5) return(-1)
if (x >= 0.5) return(1)
}
# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)
# training
m <- gp(X, Y)
# summarizing
summary(m)
# LOO cross validation
m <- validate(m)
plot(m)
# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)
# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)
# write and read the constructed emulator
write(m, 'step_gp')
m <- read('step_gp')
} # }