This function calculates Leave-One-Out (LOO) cross validation or Out-Of-Sample (OOS) validation statistics for a constructed GP, DGP, or linked (D)GP emulator.
Usage
validate(
  object,
  x_test,
  y_test,
  method,
  sample_size,
  verb,
  M,
  force,
  cores,
  ...
)
# S3 method for class 'gp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)
# S3 method for class 'dgp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)
# S3 method for class 'lgp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)Arguments
- object
- can be one of the following: - the S3 class - gp.
- the S3 class - dgp.
- the S3 class - lgp.
 
- x_test
- OOS testing input data: - if - objectis an instance of the- gpor- dgpclass,- x_testis a matrix where each row is a new input location to be used for validating the emulator and each column is an input dimension.
- if - objectis an instance of the- lgpclass,- x_testmust be a matrix representing the global input, where each row corresponds to a test data point and each column represents a global input dimension. The column indices in- x_testmust align with the indices specified in the- From_Outputcolumn of the- strucdata frame (used in- lgp()), corresponding to rows where the- From_Emulatorcolumn is- "Global".
 - x_testmust be provided if- objectis an instance of the- lgp.- x_testmust also be provided if- y_testis provided. Defaults to- NULL, in which case LOO validation is performed.
- y_test
- the OOS output data corresponding to - x_test:- if - objectis an instance of the- gpclass,- y_testis a matrix with only one column where each row represents the output corresponding to the matching row of- x_test.
- if - objectis an instance of the- dgpclass,- y_testis a matrix where each row represents the output corresponding to the matching row of- x_testand with columns representing output dimensions.
- if - objectis an instance of the- lgpclass,- y_testcan be a single matrix or a list of matrices:- if - y_testis a single matrix, then there should be only one emulator in the final layer of the linked emulator system and- y_testrepresents the emulator's output with rows being testing positions and columns being output dimensions.
- if - y_testis a list, then- y_testshould have L matrices, where L is the number of emulators in the final layer of the system. Each matrix has its rows corresponding to testing positions and columns corresponding to output dimensions of the associated emulator in the final layer.
 
 - y_testmust be provided if- objectis an instance of the- lgp.- y_testmust also be provided if- x_testis provided. Defaults to- NULL, in which case LOO validation is performed.
- method
- the prediction approach to use for validation: either the mean-variance approach ( - "mean_var") or the sampling approach (- "sampling"). For details see- predict(). Defaults to- "mean_var".
- sample_size
- the number of samples to draw for each given imputation if - method = "sampling". Defaults to- 50.
- verb
- a bool indicating if trace information for validation should be printed during function execution. Defaults to - TRUE.
- M
- the size of the conditioning set for the Vecchia approximation in emulator validation. This argument is only used if the emulator - objectwas constructed under the Vecchia approximation. Defaults to- 50.
- force
- a bool indicating whether to force LOO or OOS re-evaluation when the - looor- oosslot already exists in- object. When- force = FALSE,- validate()will only re-evaluate the emulators if the- x_testand- y_testare not identical to the values in the- oosslot. If the existing- looor- oosvalidation used a different- Min a Vecchia approximation or a different- methodto the one prescribed in this call, the emulator will be re-evaluated. Set- forceto- TRUEwhen LOO or OOS re-evaluation is required. Defaults to- FALSE.
- cores
- the number of processes to be used for validation. If set to - NULL, the number of processes is set to- max physical cores available %/% 2. Defaults to- 1.
- ...
- N/A. 
Value
- If - objectis an instance of the- gpclass, an updated- objectis returned with an additional slot called- loo(for LOO cross validation) or- oos(for OOS validation) that contains:- two slots called - x_train(or- x_test) and- y_train(or- y_test) that contain the validation data points for LOO (or OOS).
- a column matrix called - mean, if- method = "mean_var", or- median, if- method = "sampling", that contains the predictive means or medians of the GP emulator at validation positions.
- three column matrices called - std,- lower, and- upperthat contain the predictive standard deviations and credible intervals of the GP emulator at validation positions. If- method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If- method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
- a numeric value called - rmsethat contains the root mean/median squared error of the GP emulator.
- a numeric value called - nrmsethat contains the (max-min) normalized root mean/median squared error of the GP emulator. The max-min normalization uses the maximum and minimum values of the validation outputs contained in- y_train(or- y_test).
- an integer called - Mthat contains the size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.
- an integer called - sample_sizethat contains the number of samples used for validation if- method = "sampling".
 - The rows of matrices ( - mean,- median,- std,- lower, and- upper) correspond to the validation positions.
- If - objectis an instance of the- dgpclass, an updated- objectis returned with an additional slot called- loo(for LOO cross validation) or- oos(for OOS validation) that contains:- two slots called - x_train(or- x_test) and- y_train(or- y_test) that contain the validation data points for LOO (or OOS).
- a matrix called - mean, if- method = "mean_var", or- median, if- method = "sampling", that contains the predictive means or medians of the DGP emulator at validation positions.
- three matrices called - std,- lower, and- upperthat contain the predictive standard deviations and credible intervals of the DGP emulator at validation positions. If- method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If- method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
- a vector called - rmsethat contains the root mean/median squared errors of the DGP emulator across different output dimensions.
- a vector called - nrmsethat contains the (max-min) normalized root mean/median squared errors of the DGP emulator across different output dimensions. The max-min normalization uses the maximum and minimum values of the validation outputs contained in- y_train(or- y_test).
- an integer called - Mthat contains size of the conditioning set used for the Vecchia approximation, if used, for emulator validation.
- an integer called - sample_sizethat contains the number of samples used for validation if- method = "sampling".
 - The rows and columns of matrices ( - mean,- median,- std,- lower, and- upper) correspond to the validation positions and DGP emulator output dimensions, respectively.
- If - objectis an instance of the- dgpclass with a categorical likelihood, an updated- objectis returned with an additional slot called- loo(for LOO cross validation) or- oos(for OOS validation) that contains:- two slots called - x_train(or- x_test) and- y_train(or- y_test) that contain the validation data points for LOO (or OOS).
- a vector called - labelthat contains predictive labels from the DGP emulator at validation positions.
- a matrix called - probabilitythat contains mean predictive probabilities for each class from the DGP emulator at validation positions. The matrix has its rows corresponding to validation positions and columns corresponding to different classes.
- a scalar called - log_lossthat represents the log loss of the trained DGP classifier. Log loss measures the accuracy of probabilistic predictions, with lower values indicating better classification performance.- log_lossranges from- 0to positive infinity, where a value closer to- 0suggests more confident and accurate predictions.
- a scalar called - accuracythat represents the accuracy of the trained DGP classifier. Accuracy measures the proportion of correctly classified instances among all predictions, with higher values indicating better classification performance. accuracy ranges from- 0to- 1, where a value closer to- 1suggests more reliable and precise predictions.
- a slot named - methodindicating whether the matrix in the- probabilityslot were obtained using the- "mean-var"method or the- "sampling"method.
- an integer called - Mthat contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.
- an integer called - sample_sizethat contains the number of samples used for validation.
 
- If - objectis an instance of the- lgpclass, an updated- objectis returned with an additional slot called- oos(for OOS validation) that contains:- two slots called - x_testand- y_testthat contain the validation data points for OOS.
- a list called - mean, if- method = "mean_var", or- median, if- method = "sampling", that contains the predictive means or medians of the linked (D)GP emulator at validation positions.
- three lists called - std,- lower, and- upperthat contain the predictive standard deviations and credible intervals of the linked (D)GP emulator at validation positions. If- method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If- method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
- a list called - rmsethat contains the root mean/median squared errors of the linked (D)GP emulator.
- a list called - nrmsethat contains the (max-min) normalized root mean/median squared errors of the linked (D)GP emulator. The max-min normalization uses the maximum and minimum values of the validation outputs contained in- y_test.
- an integer called - Mthat contains size of the conditioning set used for the Vecchia approximation, if used, in emulator validation.
- an integer called - sample_sizethat contains the number of samples used for validation if- method = "sampling".
 - Each element in - mean,- median,- std,- lower,- upper,- rmse, and- nrmsecorresponds to a (D)GP emulator in the final layer of the linked (D)GP emulator.
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Note
- When both - x_testand- y_testare- NULL, LOO cross validation will be implemented. Otherwise, OOS validation will be implemented. LOO validation is only applicable to a GP or DGP emulator (i.e.,- objectis an instance of the- gpor- dgpclass). If a linked (D)GP emulator (i.e.,- objectis an instance of the- lgpclass) is provided,- x_testand- y_testmust also be provided for OOS validation.
