This function implements the sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
Usage
design(
object,
N,
x_cand,
y_cand,
n_cand,
limits,
int,
f,
reps,
freq,
x_test,
y_test,
reset,
target,
method,
eval,
verb,
autosave,
new_wave,
M_val,
cores,
...
)
# S3 method for class 'gp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
...
)
# S3 method for class 'dgp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
pruning = TRUE,
control = list(),
...
)
# S3 method for class 'bundle'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
...
)
Arguments
- object
can be one of the following:
the S3 class
gp
.the S3 class
dgp
.the S3 class
bundle
.
- N
the number of steps for the sequential design.
- x_cand
a matrix (with each row being a design point and column being an input dimension) that gives a candidate set in which the next design point is determined. If
x_cand = NULL
, the candidate set will be generated usingn_cand
,limits
, andint
. Defaults toNULL
.- y_cand
a matrix (with each row being a simulator evaluation and column being an output dimension) that gives the realizations from the simulator at input positions in
x_cand
. Defaults toNULL
.- n_cand
an integer that gives
the size of the candidate set in which the next design point is determined, if
x_cand = NULL
;the size of a sub-set to be sampled from the candidate set
x_cand
at each step of the sequential design to determine the next design point, ifx_cand
is notNULL
.
Defaults to
200
.- limits
a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. Set
limits = NULL
ifx_cand
is supplied. This argument is only used whenx_cand
is not supplied, i.e.,x_cand = NULL
. Defaults toNULL
.- int
a bool or a vector of bools that indicates if an input dimension is an integer type. If a bool is given, it will be applied to all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual input dimensions. Defaults to
FALSE
.- f
an R function that represents the simulator.
f
needs to be specified with the following basic rules:the first argument of the function should be a matrix with rows being different design points and columns being input dimensions.
the output of the function can either
a matrix with rows being different outputs (corresponding to the input design points) and columns being output dimensions. If there is only one output dimension, the matrix still needs to be returned with a single column.
a list with the first element being the output matrix described above and, optionally, additional named elements which will update values of any arguments with the same names passed via
...
. The list output can be useful if some additional arguments off
andaggregate
need to be updated after each step of the sequential design.
See Note section below for further information. This argument is used when
y_cand = NULL
. Defaults toNULL
.- reps
an integer that gives the number of repetitions of the located design points to be created and used for evaluations of
f
. Set the argument to an integer greater than1
iff
is a stochastic function that can generate different responses given a same input and the supplied emulatorobject
can deal with stochastic responses, e.g., a (D)GP emulator withnugget_est = TRUE
or a DGP emulator with a likelihood layer. The argument is only used whenf
is supplied. Defaults to1
.- freq
a vector of two integers with the first element giving the frequency (in number of steps) to re-fit the emulator, and the second element giving the frequency to implement the emulator validation (for RMSE). Defaults to
c(1, 1)
.- x_test
a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing input data to evaluate the emulator after each step of the sequential design. Set to
NULL
for the LOO-based emulator validation. Defaults toNULL
. This argument is only used ifeval = NULL
.- y_test
the testing output data that correspond to
x_test
for the emulator validation after each step of the sequential design:if
object
is an instance of thegp
class,y_test
is a matrix with only one column and each row being an testing output data point.if
object
is an instance of thedgp
class,y_test
is a matrix with its rows being testing output data points and columns being output dimensions.
Set to
NULL
for the LOO-based emulator validation. Defaults toNULL
. This argument is only used ifeval = NULL
.- reset
a bool or a vector of bools indicating whether to reset hyperparameters of the emulator to their initial values when it was initially constructed after the input-output update and before the re-fit. If a bool is given, it will be applied to every step of the sequential design. If a vector is provided, its length should be equal to
N
and will be applied to individual steps of the sequential design. Defaults toFALSE
.- target
a numeric or a vector that gives the target RMSEs at which the sequential design is terminated. Defaults to
NULL
, in which case the sequential design stops afterN
steps. See Note section below for further information abouttarget
.- method
an R function that give indices of designs points in a candidate set. The function must satisfy the following basic rules:
the first argument is an emulator object that can be either an instance of
the second argument is a matrix with rows representing a set of different design points.
the output of the function
is a vector of indices if the first argument is an instance of the
gp
class;is a matrix of indices if the first argument is an instance of the
dgp
class. If there are different design points to be added with respect to different outputs of the DGP emulator, the column number of the matrix should equal to the number of the outputs. If design points are common to all outputs of the DGP emulator, the matrix should be single-columned. If more than one design points are determined for a given output or for all outputs, the indices of these design points are placed in the matrix with extra rows.is a matrix of indices if the first argument is an instance of the
bundle
class. Each row of the matrix gives the indices of the design points to be added to individual emulators in the bundle.
See
alm()
,mice()
,pei()
, andvigf()
for examples on customizingmethod
. Defaults tovigf()
.- eval
an R function that calculates the customized evaluating metric of the emulator. The function must satisfy the following basic rules:
the first argument is an emulator object that can be either an instance of
the output of the function can be
a single metric value, if the first argument is an instance of the
gp
class;a single metric value or a vector of metric values with the length equal to the number of output dimensions, if the first argument is an instance of the
dgp
class;a single metric value metric or a vector of metric values with the length equal to the number of emulators in the bundle, if the first argument is an instance of the
bundle
class.
If no customized function is provided, the built-in evaluation metric, RMSE, will be calculated. Defaults to
NULL
. See Note section below for further information.- verb
a bool indicating if the trace information will be printed during the sequential design. Defaults to
TRUE
.- autosave
a list that contains configuration settings for the automatic saving of the emulator:
switch
: a bool indicating whether to enable the automatic saving of the emulator during the sequential design. When set toTRUE
, the emulator in the final iteration is always saved. Defaults toFALSE
.directory
: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory ofdirectory
named 'emulator-id
'. Defaults to './check_points'.fname
: a string representing the base name for the saved emulator files. Defaults to 'check_point'.freq
: an integer indicating the frequency of automatic savings, measured in the number of iterations. Defaults to5
.overwrite
: a bool value controlling the file saving behavior. When set toTRUE
, each new automatic saving overwrites the previous one, keeping only the latest version. IfFALSE
, each automatic saving creates a new file, preserving all previous versions. Defaults toFALSE
.
- new_wave
a bool indicating if the current execution of
design()
will create a new wave of sequential designs or add the sequential designs to the last existing wave. This argument is only used if there are waves existing in the emulator. By creating new waves, one can better visualize the performance of the sequential designs in different executions ofdesign()
indraw()
and can specify a different evaluation frequency infreq
. However, it can be beneficiary to turn this option off to restrict a large number of waves to be visualized indraw()
that could run out of colors. Defaults toTRUE
.- M_val
an integer that gives the size of the conditioning set for the Vecchia approximation in emulator validations. This argument is only used if the emulator
object
was constructed under the Vecchia approximation. Defaults to50
.- cores
an integer that gives the number of processes to be used for emulator validations. If set to
NULL
, the number of processes is set tomax physical cores available %/% 2
. Defaults to1
. This argument is only used ifeval = NULL
.- ...
Any arguments with names that differ from those used in
design()
but are required byf
,method
, oreval
can be passed here.design()
will forward relevant arguments tof
,method
, andeval
based on the names of the additional arguments provided.If you are using package-provided methods such as
vigf()
,alm()
,mice()
, orpei()
formethod
, you can passbatch_size
todesign()
to select multiple design points at each iteration. For other arguments that control the behavior of these four methods, please refer to their documentation.- train_N
the number of training iterations to be used to re-fit the DGP emulator at each step of the sequential design:
If
train_N
is an integer, then at each step the DGP emulator will be re-fitted (based on the frequency of re-fit specified infreq
) withtrain_N
iterations.If
train_N
is a vector, then its size must beN
even the re-fit frequency specified infreq
is not one.If
train_N
isNULL
, then at each step the DGP emulator will be re-fitted (based on the frequency of re-fit specified infreq
) with100
iterations if the DGP emulator was constructed without the Vecchia approximation, and with50
iterations if Vecchia approximation was used.
Defaults to
NULL
.- refit_cores
the number of processes to be used to re-fit GP components (in the same layer of a DGP emulator) at each M-step during the re-fitting. If set to
NULL
, the number of processes is set to(max physical cores available - 1)
if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set tomax physical cores available %/% 2
. Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to1
.- pruning
a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of design points exceeds
min_size
incontrol
. The argument is only applicable to DGP emulators (i.e.,object
is an instance ofdgp
class) produced bydgp()
withstruc = NULL
. Defaults toTRUE
.- control
a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
min_size
, the minimum number of design points required to trigger the dynamic pruning. Defaults to 10 times of the input dimensions.threshold
, the R2 value above which a GP node is considered redundant. Defaults to0.97
.nexceed
, the minimum number of consecutive iterations that the R2 value of a GP node must exceedthreshold
to trigger the removal of that node from the DGP structure. Defaults to3
.
The argument is only used when
pruning = TRUE
.
Value
An updated object
is returned with a slot called design
that contains:
S slots, named
wave1, wave2,..., waveS
, that contain information of S waves of sequential designs that have been applied to the emulator. Each slot contains the following elements:N
, an integer that gives the numbers of steps implemented in the corresponding wave;rmse
, a matrix that gives the RMSEs of emulators constructed during the corresponding wave, ifeval = NULL
;metric
, a matrix that gives the customized evaluating metric values of emulators constructed during the corresponding wave, if a customized function is supplied toeval
;freq
, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave.enrichment
, a vector of sizeN
that gives the number of new design points added after each step of the sequential design (ifobject
is an instance of thegp
ordgp
class), or a matrix that gives the number of new design points added to emulators in a bundle after each step of the sequential design (ifobject
is an instance of thebundle
class).
If
target
is notNULL
, the following additional elements are also included:target
, the target RMSE(s) to stop the sequential design.reached
, a bool (ifobject
is an instance of thegp
ordgp
class) or a vector of bools (ifobject
is an instance of thebundle
class) that indicate if the target RMSEs are reached at the end of the sequential design.
a slot called
type
that gives the type of validations:either LOO ('loo') or OOS ('oos') if
eval = NULL
. Seevalidate()
for more information about LOO and OOS.'customized' if a customized R function is provided to
eval
.
two slots called
x_test
andy_test
that contain the data points for the OOS validation if thetype
slot is 'oos'.If
y_cand = NULL
and there areNA
s returned from the suppliedf
during the sequential design, a slot calledexclusion
is included that records the located design positions that producedNA
s viaf
. The sequential design will use this information to avoid re-visiting the same locations (ifx_cand
is supplied) or their neighborhoods (ifx_cand
isNULL
) in later runs ofdesign()
.
See Note section below for further information.
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Note
The validation of an emulator is forced after the final step of a sequential design even
N
is not multiples of the second element infreq
.Any
loo
oroos
slot that already exists inobject
will be cleaned, and a new slot calledloo
oroos
will be created in the returned object depending on whetherx_test
andy_test
are provided. The new slot gives the validation information of the emulator constructed in the final step of the sequential design. Seevalidate()
for more information about the slotsloo
andoos
.If
object
has previously been used bydesign()
for sequential designs, the information of the current wave of the sequential design will replace those of old waves and be contained in the returned object, unlessthe validation type (LOO or OOS depending on whether
x_test
andy_test
are supplied or not) of the current wave of the sequential design is the same as the validation types (shown in thetype
of thedesign
slot ofobject
) in previous waves, and if the validation type is OOS,x_test
andy_test
in the current wave must also be identical to those in the previous waves;both the current and previous waves of the sequential design supply customized evaluation functions to
eval
. Users need to ensure the customized evaluation functions are consistent among different waves. Otherwise, the trace plot of RMSEs produced bydraw()
will show values of different evaluation metrics in different waves.
In above two cases, the information of the current wave of the sequential design will be added to the
design
slot of the returned object under the namewaveS
.If
object
is an instance of thegp
class andeval = NULL
, the matrix in thermse
slot is single-columned. Ifobject
is an instance of thedgp
orbundle
class andeval = NULL
, the matrix in thermse
slot can have multiple columns that correspond to different output dimensions or different emulators in the bundle.If
object
is an instance of thegp
class andeval = NULL
,target
needs to be a single value giving the RMSE threshold. Ifobject
is an instance of thedgp
orbundle
class andeval = NULL
,target
can be a vector of values that gives the RMSE thresholds for different output dimensions or different emulators. If a single value is provided, it will be used as the RMSE threshold for all output dimensions (ifobject
is an instance of thedgp
) or all emulators (ifobject
is an instance of thebundle
). If a customized function is supplied toeval
, the user needs to ensure that the length oftarget
is equal to that of the output fromeval
.When defining
f
, it is important to ensure that:the column order of the first argument of
f
is consistent with the training input used for the emulator;the column order of the output matrix of
f
is consistent with the order of emulator output dimensions (ifobject
is an instance of thedgp
class), or the order of emulators placed inobject
(ifobject
is an instance of thebundle
class).
The output matrix produced by
f
may includeNA
s. This is especially beneficial as it allows the sequential design process to continue without interruption, even if errors orNA
outputs are encountered fromf
at certain input locations identified by the sequential designs. Users should ensure to handle any errors withinf
by appropriately returningNA
s.When defining
eval
, the output metric needs to be positive ifdraw()
is used withlog = T
. And one needs to ensure that a lower metric value indicates a better emulation performance iftarget
is set.
Examples
if (FALSE) { # \dontrun{
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}
# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)
# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)
# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)
# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)
# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# draw the design created by the sequential design
draw(m,'design')
# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')
# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)
# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)
} # }