Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using ALM
Source:R/alm.R
alm.Rd
This function searches from a candidate set to locate the next design point(s) to be added to a (D)GP emulator or a bundle of (D)GP emulators using the Active Learning MacKay (ALM), see the reference below.
Usage
alm(object, x_cand, ...)
# S3 method for gp
alm(object, x_cand, batch_size = 1, M = 50, workers = 1, ...)
# S3 method for dgp
alm(object, x_cand, batch_size = 1, M = 50, workers = 1, aggregate = NULL, ...)
# S3 method for bundle
alm(object, x_cand, batch_size = 1, M = 50, workers = 1, aggregate = NULL, ...)
Arguments
- object
can be one of the following:
the S3 class
gp
.the S3 class
dgp
.the S3 class
bundle
.
- x_cand
a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design point(s) are determined. If
object
is an instance of thebundle
class,x_cand
could also be a list with the length equal to the number of emulators contained in theobject
. Each slot inx_cand
is a matrix that gives a candidate set for each emulator included in the bundle. See Note section below for further information.- ...
any arguments (with names different from those of arguments used in
alm()
) that are used byaggregate
can be passed here.- batch_size
an integer that gives the number of design points to be chosen. Defaults to
1
.- M
the size of the conditioning set for the Vecchia approximation in the criterion calculation. This argument is only used if the emulator
object
was constructed under the Vecchia approximation. Defaults to50
.- workers
the number of processes to be used for the criterion calculation. If set to
NULL
, the number of processes is set tomax physical cores available %/% 2
. Defaults to1
.- aggregate
an R function that aggregates scores of the ALM across different output dimensions (if
object
is an instance of thedgp
class) or across different emulators (ifobject
is an instance of thebundle
class). The function should be specified in the following basic form:the first argument is a matrix representing scores. The rows of the matrix correspond to different design points. The number of columns of the matrix equals to:
the emulator output dimension if
object
is an instance of thedgp
class; orthe number of emulators contained in
object
ifobject
is an instance of thebundle
class.
the output should be a vector that gives aggregations of scores at different design points.
Set to
NULL
to disable the aggregation. Defaults toNULL
.
Value
If
object
is an instance of thegp
class, a vector is returned with the length equal tobatch_size
, giving the positions (i.e., row numbers) of next design points fromx_cand
.If
object
is an instance of thedgp
class, a matrix is returned with row number equal tobatch_size
and column number equal to one (ifaggregate
is notNULL
) or the output dimension (ifaggregate
isNULL
), giving positions (i.e., row numbers) of next design points fromx_cand
to be added to the DGP emulator across different outputs. Ifobject
is a DGP emulator with eitherHetero
orNegBin
likelihood layer, the returned matrix has two columns with the first column giving positions of next design points fromx_cand
that correspond to the mean parameter of the normal or negative Binomial distribution, and the second column giving positions of next design points fromx_cand
that correspond to the variance parameter of the normal distribution or the dispersion parameter of the negative Binomial distribution.If
object
is an instance of thebundle
class, a matrix is returned with row number equal tobatch_size
and column number equal to the number of emulators in the bundle, giving positions (i.e., row numbers) of next design points fromx_cand
to be added to individual emulators.
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Note
The column order of the first argument of
aggregate
must be consistent with the order of emulator output dimensions (ifobject
is an instance of thedgp
class), or the order of emulators placed inobject
ifobject
is an instance of thebundle
class;If
x_cand
is supplied as a list whenobject
is an instance ofbundle
class and aaggregate
function is provided, the matrices inx_cand
must have common rows (i.e., the candidate sets of emulators in the bundle have common input locations) so theaggregate
function can be applied.Any R vector detected in
x_cand
will be treated as a column vector and automatically converted into a single-column R matrix.
References
MacKay, D. J. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590-604.
Examples
if (FALSE) {
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 1D non-stationary function
f <- function(x) {
sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
}
# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)
# training a 2-layered DGP emulator with the global connection off
m <- dgp(X, Y, connect = F)
# generate a candidate set
x_cand <- maximinLHS(200,1)
# locate the next design point using ALM
next_point <- alm(m, x_cand = x_cand)
X_new <- x_cand[next_point,,drop = F]
# obtain the corresponding output at the located design point
Y_new <- f(X_new)
# combine the new input-output pair to the existing data
X <- rbind(X, X_new)
Y <- rbind(Y, Y_new)
# update the DGP emulator with the new input and output data and refit
m <- update(m, X, Y, refit = TRUE)
# plot the LOO validation
plot(m)
}