This vignette gives a demonstration of the package on DGP emulation of large-scale datasets.
Construct a synthetic simulator
We consider a 2-dimensional synthetic simulator with the following
functional form defined over [0, 1]
and
[0, 1]
:
f <- function(x) {
x1 <- x[,1,drop=F] * 4 -2
x2 <- x[,2,drop=F] * 4 -2
y <- 0.5 + ((sin(x1^2-x2^2))^2 - 0.5)/(1 + 0.001*(x1^2+x2^2))^2
return(y)
}
We now specify a seed with set_seed()
from the package
for reproducibility
set_seed(9999)
and generate 3000
training data points:
X <- randomLHS(3000, 2)
Y <- f(X)
Training
We now build and train a large-scale DGP emulator using a Vecchia implementation under our Stochastic Imputation (SI) framework:
m <- dgp(X, Y, vecchia = TRUE)
## Auto-generating a 2-layered DGP structure ... done
## Initializing the DGP emulator ... done
## Training the DGP emulator:
## Iteration 200: Layer 2: 100%|██████████| 200/200 [02:20<00:00, 1.43it/s]
## Imputing ... done
Validation
After we have the emulator, we can validate it by the Leave-One-Out (LOO) cross validation plot:
plot(m)
## Validating and computing ... done
## Post-processing LOO results ... done
## Plotting ... done
or the Out-Of-Sample (OOS) validation plot over 1000
randomly generated testing locations:
## Validating and computing ... done
## Post-processing OOS results ... done
## Plotting ... done
Note that gp()
can also operate in Vecchia mode. For
this problem, using m_gp <- gp(X, Y, vecchia = TRUE)
, we
found that the GP emulator m_gp
achieved half the NRMSE of
the DGP emulator m
, but it had a few more points outside
the credible intervals.
Performance tip
The Vecchia implementation in the package leverages the multiple
cores available on the machine for multi-thread computation. To optimize
performance, we recommend experimenting with the number of threads using
set_thread_num()
. The current number of threads being used
by the package can be checked with get_thread_num()
.