This function calculates Leave-One-Out (LOO) cross validation or Out-Of-Sample (OOS) validation statistics for a constructed GP, DGP, or linked (D)GP emulator.
Usage
validate(
object,
x_test,
y_test,
method,
sample_size,
verb,
M,
force,
cores,
...
)
# S3 method for class 'gp'
validate(
object,
x_test = NULL,
y_test = NULL,
method = NULL,
sample_size = 50,
verb = TRUE,
M = 50,
force = FALSE,
cores = 1,
...
)
# S3 method for class 'dgp'
validate(
object,
x_test = NULL,
y_test = NULL,
method = NULL,
sample_size = 50,
verb = TRUE,
M = 50,
force = FALSE,
cores = 1,
...
)
# S3 method for class 'lgp'
validate(
object,
x_test = NULL,
y_test = NULL,
method = NULL,
sample_size = 50,
verb = TRUE,
M = 50,
force = FALSE,
cores = 1,
...
)
Arguments
- object
can be one of the following:
the S3 class
gp
.the S3 class
dgp
.the S3 class
lgp
.
- x_test
OOS testing input data:
if
object
is an instance of thegp
ordgp
class,x_test
is a matrix where each row is a new input location to be used for validating the emulator and each column is an input dimension.if
object
is an instance of thelgp
class,x_test
can be a matrix or a list:if
x_test
is a matrix, it is the global testing input data that feed into the emulators in the first layer of a system. The rows ofx_test
represent different input data points and the columns represent input dimensions across all emulators in the first layer of the system. In this case, it is assumed that the only global input to the system is the input to the emulators in the first layer and there is no global input to emulators in other layers.if
x_test
is a list, it should have L (the number of layers in an emulator system) elements. The first element is a matrix that represents the global testing input data that feed into the emulators in the first layer of the system. The remaining L-1 elements are L-1 sub-lists, each of which contains a number (the same number of emulators in the corresponding layer) of matrices (rows being testing input data points and columns being input dimensions) that represent the global testing input data to the emulators in the corresponding layer. The matrices must be placed in the sub-lists based on how their corresponding emulators are placed instruc
argument oflgp()
. If there is no global input data to a certain emulator, setNULL
in the corresponding sub-list ofx_test
.
This option for linked (D)GP emulators is deprecated and will be removed in the next release.
If
object
is an instance of thelgp
class created bylgp()
with argumentstruc
in data frame form,x_test
must be a matrix representing the global input, where each row corresponds to a test data point and each column represents a global input dimension. The column indices inx_test
must align with the indices specified in theFrom_Output
column of thestruc
data frame (used inlgp()
), corresponding to rows where theFrom_Emulator
column is"Global"
.
x_test
must be provided ifobject
is an instance of thelgp
.x_test
must also be provided ify_test
is provided. Defaults toNULL
, in which case LOO validation is performed.- y_test
the OOS output data corresponding to
x_test
:if
object
is an instance of thegp
class,y_test
is a matrix with only one column where each row represents the output corresponding to the matching row ofx_test
.if
object
is an instance of thedgp
class,y_test
is a matrix where each row represents the output corresponding to the matching row ofx_test
and with columns representing output dimensions.if
object
is an instance of thelgp
class,y_test
can be a single matrix or a list of matrices:if
y_test
is a single matrix, then there should be only one emulator in the final layer of the linked emulator system andy_test
represents the emulator's output with rows being testing positions and columns being output dimensions.if
y_test
is a list, theny_test
should have L matrices, where L is the number of emulators in the final layer of the system. Each matrix has its rows corresponding to testing positions and columns corresponding to output dimensions of the associated emulator in the final layer.
y_test
must be provided ifobject
is an instance of thelgp
.y_test
must also be provided ifx_test
is provided. Defaults toNULL
, in which case LOO validation is performed.- method
the prediction approach to use for validation: either the mean-variance approach (
"mean_var"
) or the sampling approach ("sampling"
). For details seepredict()
. For DGP emulators with a categorical likelihood (likelihood = "Categorical"
indgp()
), only the sampling approach is supported. By default, the method is set to"sampling"
for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods and"mean_var"
otherwise.- sample_size
the number of samples to draw for each given imputation if
method = "sampling"
. Defaults to50
.- verb
a bool indicating if trace information for validation should be printed during function execution. Defaults to
TRUE
.- M
the size of the conditioning set for the Vecchia approximation in emulator validation. This argument is only used if the emulator
object
was constructed under the Vecchia approximation. Defaults to50
.- force
a bool indicating whether to force LOO or OOS re-evaluation when the
loo
oroos
slot already exists inobject
. Whenforce = FALSE
,validate()
will only re-evaluate the emulators if thex_test
andy_test
are not identical to the values in theoos
slot. If the existingloo
oroos
validation used a differentM
in a Vecchia approximation or a differentmethod
to the one prescribed in this call, the emulator will be re-evaluated. Setforce
toTRUE
when LOO or OOS re-evaluation is required. Defaults toFALSE
.- cores
the number of processes to be used for validation. If set to
NULL
, the number of processes is set tomax physical cores available %/% 2
. Defaults to1
.- ...
N/A.
Value
If
object
is an instance of thegp
class, an updatedobject
is returned with an additional slot calledloo
(for LOO cross validation) oroos
(for OOS validation) that contains:two slots called
x_train
(orx_test
) andy_train
(ory_test
) that contain the validation data points for LOO (or OOS).a column matrix called
mean
, ifmethod = "mean_var"
, ormedian
, ifmethod = "sampling"
, that contains the predictive means or medians of the GP emulator at validation positions.three column matrices called
std
,lower
, andupper
that contain the predictive standard deviations and credible intervals of the GP emulator at validation positions. Ifmethod = "mean_var"
, the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. Ifmethod = "sampling"
, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.a numeric value called
rmse
that contains the root mean/median squared error of the GP emulator.a numeric value called
nrmse
that contains the (max-min) normalized root mean/median squared error of the GP emulator. The max-min normalization uses the maximum and minimum values of the validation outputs contained iny_train
(ory_test
).an integer called
M
that contains the size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.an integer called
sample_size
that contains the number of samples used for validation ifmethod = "sampling"
.
The rows of matrices (
mean
,median
,std
,lower
, andupper
) correspond to the validation positions.If
object
is an instance of thedgp
class, an updatedobject
is returned with an additional slot calledloo
(for LOO cross validation) oroos
(for OOS validation) that contains:two slots called
x_train
(orx_test
) andy_train
(ory_test
) that contain the validation data points for LOO (or OOS).a matrix called
mean
, ifmethod = "mean_var"
, ormedian
, ifmethod = "sampling"
, that contains the predictive means or medians of the DGP emulator at validation positions.three matrices called
std
,lower
, andupper
that contain the predictive standard deviations and credible intervals of the DGP emulator at validation positions. Ifmethod = "mean_var"
, the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. Ifmethod = "sampling"
, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.a vector called
rmse
that contains the root mean/median squared errors of the DGP emulator across different output dimensions.a vector called
nrmse
that contains the (max-min) normalized root mean/median squared errors of the DGP emulator across different output dimensions. The max-min normalization uses the maximum and minimum values of the validation outputs contained iny_train
(ory_test
).an integer called
M
that contains size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.an integer called
sample_size
that contains the number of samples used for validation ifmethod = "sampling"
.
The rows and columns of matrices (
mean
,median
,std
,lower
, andupper
) correspond to the validation positions and DGP emulator output dimensions, respectively.If
object
is an instance of thedgp
class with a categorical likelihood, an updatedobject
is returned with an additional slot calledloo
(for LOO cross validation) oroos
(for OOS validation) that contains:two slots called
x_train
(orx_test
) andy_train
(ory_test
) that contain the validation data points for LOO (or OOS).a matrix called
label
that contains predictive samples of labels from the DGP emulator at validation positions. The matrix has its rows corresponding to validation positions and columns corresponding to samples of labels.a list called
probability
that contains predictive samples of probabilities for each class from the DGP emulator at validation positions. The element in the list is a matrix that has its rows corresponding to validation positions and columns corresponding to samples of probabilities.a scalar called
log_loss
that represents the average log loss of the predicted labels in the DGP emulator across all validation positions. Log loss measures the accuracy of probabilistic predictions, with lower values indicating better classification performance.log_loss
ranges from0
to positive infinity, where a value closer to0
suggests more confident and accurate predictions.an integer called
M
that contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.an integer called
sample_size
that contains the number of samples used for validation.
If
object
is an instance of thelgp
class, an updatedobject
is returned with an additional slot calledoos
(for OOS validation) that contains:two slots called
x_test
andy_test
that contain the validation data points for OOS.a list called
mean
, ifmethod = "mean_var"
, ormedian
, ifmethod = "sampling"
, that contains the predictive means or medians of the linked (D)GP emulator at validation positions.three lists called
std
,lower
, andupper
that contain the predictive standard deviations and credible intervals of the linked (D)GP emulator at validation positions. Ifmethod = "mean_var"
, the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. Ifmethod = "sampling"
, the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.a list called
rmse
that contains the root mean/median squared errors of the linked (D)GP emulator.a list called
nrmse
that contains the (max-min) normalized root mean/median squared errors of the linked (D)GP emulator. The max-min normalization uses the maximum and minimum values of the validation outputs contained iny_test
.an integer called
M
that contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.an integer called
sample_size
that contains the number of samples used for validation ifmethod = "sampling"
.
Each element in
mean
,median
,std
,lower
,upper
,rmse
, andnrmse
corresponds to a (D)GP emulator in the final layer of the linked (D)GP emulator.
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/dev/.
Note
When both
x_test
andy_test
areNULL
, LOO cross validation will be implemented. Otherwise, OOS validation will be implemented. LOO validation is only applicable to a GP or DGP emulator (i.e.,object
is an instance of thegp
ordgp
class). If a linked (D)GP emulator (i.e.,object
is an instance of thelgp
class) is provided,x_test
andy_test
must also be provided for OOS validation.