dgpsi: An R Package for Emulation

## _dgpsi_: An R Package for Emulation

### Deyu Ming

pre {
  min-height: 25px;
  max-height: 320px;
  padding-left: 20px;
  overflow-y: auto;
}

pre[class] {
  max-height: 200px;
}

.right-column{
  padding-top: 0;
}

iframe {
  border: none;
}

.center2 {
  margin: 0;
  position: absolute;
  top: 25%;
  -ms-transform: translate(-50%, -50%);
}

.remark-code, .remark-inline-code {
  background: #f0f0f0;
}
.remark-code {
  font-size: 14px;
}
.huge .remark-code { /*Change made here*/
  font-size: 200% !important;
}
.tiny .remark-code { /*Change made here*/
  font-size: 50% !important;
}

.pull-left-narrow {
  padding-top: 0;
  float: left;
  width: 40%;
}

.pull-right-wide {
  padding-top: 0;
  float: right;
  width: 55%;
}

img {vertical-align: top;}

</style>

---

## Agenda
.pull-left[
**Basics**
  - An overview of the package
  
**Installation**
  - How to install the package 
  - How to activate the package
  
**Function Overview**
  - An overview of functions available
]

**Try `dgpsi`**
  - A quick demo of the package

**A Synthetic Case Study**
  - Model network emulation
  
 
  
  
  
**Additional Resources**
  - Where to get help 
  - How to contribute
]

---

# Basics

---

### An overview of the package

`dgpsi` is an R package for deep and linked Gaussian process emulations. It currently implements:

* Gaussian process emulations.

* Deep Gaussian process emulations with: 
    - multiple layers;
    - multiple GP nodes;
    - non-Gaussian likelihoods (Poisson, Negative-Binomial, and heteroskedastic Gaussian).
    
* Linked emulations of model networks by linking (D)GP emulators.

* Leave-One-Out (LOO) and Out-Of-Sample (OOS) validations for GP, DGP, and linked (D)GP emulators.

* Sequential designs for (D)GP emulators and bundles of (D)GP emulators.

---

# Installation

---

### How to install the package

First of all, make sure you have the latest versions of R and RStuido installed on your machine. Then from RStudio you can install either the released version of the package from CRAN:

```r
install.packages('dgpsi')
```

or the development version of the package from GitHub:

```r
devtools::install_github('mingdeyu/dgpsi-R')
```

After the installation, run

```r
library(dgpsi)
init_py()
```

to install the required Python environment.

The installation of the underlying Python environment only needs to be done once after you install or update the R package.

---

### How to activate the package

You MUST run `init_py()` every time after `library(dgpsi)`, telling R to invoke the required Python environment:

```r
library(dgpsi)
init_py()
```

If you installed the development version and wanted to update the underlying Python implementation, you can reinstall the development Python version by:

```r
init_py(reinstall = T)
```

If you experience issues while running `init_py()`, you can try to reinstall the Python environment:

```r
init_py(reinstall = T)
```

or uninstall completely the Python environment:

```r
init_py(uninstall = T)
```

and reinstall everything.

---

# Function Overview

---

### An overview of functions available

Functions to train, validate, and make predictions from GP, DGP, and Linked (D)GP emulators.

* .black[`gp()`]: Gaussian process emulator construction

* .black[`dgp()`]: Deep Gaussian process emulator construction

* .black[`lgp()`]: Linked (D)GP emulator construction

* `continue()`: Continue the training of a DGP emulator

* .black[`combine()`]: Combine layers

* `predict()`: Predictions from GP, DGP, or linked (D)GP emulators

* .black[`validate()`]: Validate a constructed GP, DGP, or linked (D)GP emulator

* .black[`plot()`]: Validation plots of a constructed GP, DGP, or linked (D)GP emulator

]

Functions to implement sequential designs for (D)GP and bundles of (D)GP emulators.

* .black[`design()`]: Sequential design of a (D)GP emulator or a bundle of (D)GP emulators

* `alm()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using ALM

* `mice()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using MICE

* `pei()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using PEI

* `update()`: Update a GP or DGP emulator

* .black[`draw()`]: Validation plots of a sequential design

* `pack()`: Pack GP and DGP emulators into a bundle

* `unpack()`: Unpack a bundle of (D)GP emulators

]

Functions to customize structures of GP and DGP emulators.

* `kernel()`: Initialize a Gaussian process node

* `Hetero()`: Initialize a heteroskedastic Gaussian likelihood node

* `Poisson()`: Initialize a Poisson likelihood node

* `NegBin()`: Initialize a negative Binomial likelihood node

]

Some useful functions to manipulate the constructed emulators.

* .black[`summary()`]: Summary of a constructed GP, DGP, or linked (D)GP emulator

* .black[`write()`]: Save the constructed emulator

* .black[`read()`]: Load the stored emulator

* .black[`set_seed()`]: Random seed generator

* .black[`set_linked_idx()`]: Set linked indices

* `set_imp()`: Reset number of imputations for a DGP emulator

* `window()`: Trim the sequences of model parameters of a DGP emulator

* `trace_plot()`: Plot of DGP model parameter traces

* `nllik()`: Calculate negative predicted log-likelihood

]

---

# Try dgpsi

---

### A quick demo of the package

.panel[.panel-name[Problem]
We aim to build a GP emulator of an 1-D synthetic simulator `f` over `[0,1]`.

```r
library(ggplot2)
f <- function(x) { sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4)) } # Simulator
# Visualise the simulator
x <- seq(0, 1, length = 200)
dat <- data.frame('x' = x, 'y' = f(x))
ggplot(data = dat, aes(x = x, y = y)) + geom_line(color = 'dodgerblue2')
```

]

We first load the required packages and generate 5 training points from the simulator:

```r
library(lhs) # for latin-hypercube sampling
library(patchwork) # for visualisation
library(dgpsi) # for emulation
init_py() # alway run this function to invoke the required Python env
set_seed(99) # set a random seed for reproducibility
X <- maximinLHS(5,1) # training input
Y <- f(X) # training output
```

We also generate a testing dataset for the validation

```r
validate_x <- maximinLHS(500,1) # testing input
validate_y <- f(validate_x) # testing output
```

We then initialise a GP emulator:

```r
m_gp <- gp(X, Y) # GP emulation
```

```
## Auto-generating a GP structure ... done
## Initializing the GP emulator ... done
## Training the GP emulator ... done
```

]

We now refine the GP emulator by adaptively adding 35 additional data points:

```r
m_gp <- design(m_gp, N = 35, limits = c(0,1), f = f, 
                x_test = validate_x, y_test = validate_y)
```

```
## Initializing ... done
##  * RMSE: 0.532536
## Iteration 1:
##  - Locating ... done
##  * Next design point: 0.002443
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.475652
## Iteration 2:
##  - Locating ... done
##  * Next design point: 0.390530
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.482027
## Iteration 3:
##  - Locating ... done
##  * Next design point: 0.820480
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.482385
## Iteration 4:
##  - Locating ... done
##  * Next design point: 0.990126
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.483036
## Iteration 5:
##  - Locating ... done
##  * Next design point: 0.611982
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.488943
## Iteration 6:
##  - Locating ... done
##  * Next design point: 0.318553
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.238480
## Iteration 7:
##  - Locating ... done
##  * Next design point: 0.874000
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.238344
## Iteration 8:
##  - Locating ... done
##  * Next design point: 0.766622
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.238232
## Iteration 9:
##  - Locating ... done
##  * Next design point: 0.180834
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.221539
## Iteration 10:
##  - Locating ... done
##  * Next design point: 0.483249
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.214417
## Iteration 11:
##  - Locating ... done
##  * Next design point: 0.444714
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.219416
## Iteration 12:
##  - Locating ... done
##  * Next design point: 0.072204
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.149644
## Iteration 13:
##  - Locating ... done
##  * Next design point: 0.655350
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.149650
## Iteration 14:
##  - Locating ... done
##  * Next design point: 0.282643
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.132904
## Iteration 15:
##  - Locating ... done
##  * Next design point: 0.571662
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.132212
## Iteration 16:
##  - Locating ... done
##  * Next design point: 0.037788
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.132003
## Iteration 17:
##  - Locating ... done
##  * Next design point: 0.359289
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.132862
## Iteration 18:
##  - Locating ... done
##  * Next design point: 0.731314
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.133663
## Iteration 19:
##  - Locating ... done
##  * Next design point: 0.902411
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.134184
## Iteration 20:
##  - Locating ... done
##  * Next design point: 0.209010
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.134244
## Iteration 21:
##  - Locating ... done
##  * Next design point: 0.100697
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.081932
## Iteration 22:
##  - Locating ... done
##  * Next design point: 0.415692
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.070921
## Iteration 23:
##  - Locating ... done
##  * Next design point: 0.794646
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.071541
## Iteration 24:
##  - Locating ... done
##  * Next design point: 0.968340
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.072301
## Iteration 25:
##  - Locating ... done
##  * Next design point: 0.846047
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.073745
## Iteration 26:
##  - Locating ... done
##  * Next design point: 0.153288
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.067023
## Iteration 27:
##  - Locating ... done
##  * Next design point: 0.505460
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.066240
## Iteration 28:
##  - Locating ... done
##  * Next design point: 0.634273
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.065525
## Iteration 29:
##  - Locating ... done
##  * Next design point: 0.260127
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.064523
## Iteration 30:
##  - Locating ... done
##  * Next design point: 0.683958
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.063323
## Iteration 31:
##  - Locating ... done
##  * Next design point: 0.553927
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.062111
## Iteration 32:
##  - Locating ... done
##  * Next design point: 0.335745
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.060928
## Iteration 33:
##  - Locating ... done
##  * Next design point: 0.019728
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.025229
## Iteration 34:
##  - Locating ... done
##  * Next design point: 0.462833
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.025892
## Iteration 35:
##  - Locating ... done
##  * Next design point: 0.917233
##  - Updating and re-fitting ... done
##  - Validating ... done
##  * RMSE: 0.026677
```

]

We can then validate, save, and load the refined GP emulator:

```r
p1 <- draw(m_gp, log = T) # check the error trace during the sequential design
p2 <- plot(m_gp, validate_x, validate_y, type = 'line') # check the fit
p1 + p2 + plot_layout(guides = 'keep')
```

```r
write(m_gp, 'm_gp_emulator') # save the GP emulator
m_gp <- read('m_gp_emulator') # load the saved GP emulator
```
]

Try to reproduce the following results by building a DGP emulator adaptively with `dgp()` and `design()` using a random seed at `99`.

**What do you observe in comparison to the results from the GP emulator?**

]

---

# A Synthetic Case Study

---

### Model network emulation

.panel[.panel-name[Problem]
We aim to build a linked emulator of a feed-forward synthetic model network formed by three 1-D models defined over `[0,1]`.

```r
# Component 1
f1 <- function(x) {
  (sin(7.5*x)+1)/2
}

# Component 2
f2 <- function(x) {
  1/3*sin(2*(2*x - 1))+2/3*exp(-30*(2*(2*x-1))^2)+1/3
}

# Component 3
f3 <- function(x) {
  (sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))+1)/2
}

# Network
f123 <- function(x) {
  f3(f2(f1(x)))
}
```

]

.panel[.panel-name[Emulation]
1. Generate 5 training data points and 500 validation data points from `f1()`, `f2()` and `f3()`.
2. Construct a GP emulator `m1` of `f1()`, and DGP emulators `m2` and `m3` of `f2()` and `f3()` respectively with the initial 5 training data points.
3. Refine the emulators with sequential designs by adding 25 more training data points.
4. Validate the emulators with the validation data.

]

After the emulators of individual components are built and validated. We can use `set_linked_idx()` to specify the connections among the emulators:

```r
# f1 takes all global inputs
m1 <- set_linked_idx(m1, c(1))

# f2 takes everything created by f1
m2 <- set_linked_idx(m2, c(1))

# f3 takes all outputs from f2 as its inputs
m3 <- set_linked_idx(m3, c(1))
```

We then can link the three emulators by `combine()` and feed it into `lgp()` to construct the linked emulator of `f123()`:

```r
# combine the three individual emulators 
struc <- combine( list(m1), list(m2), list(m3) )

# construct the linked emulator
m_link <- lgp(struc)
```

]

We can inspect and double-check the connections of emulators contained in the constructed linked emulator `m_link` by `summary()`:

```r
# inspect the connections of emulators in the linked emulator
summary(m_link)
```

```
# +-----------+--------------+------+----------------------------+-----------------+
# | Layer No. | Emulator No. | Type |         Connection         | External Inputs |
# +-----------+--------------+------+----------------------------+-----------------+
# |  Layer 1  |    Emu 1     |  GP  |     Global input: [1]      |       No        |
# |  Layer 2  |    Emu 1     | DGP  | Emu 1 in Layer 1: output 1 |       No        |
# |  Layer 3  |    Emu 1     | DGP  | Emu 1 in Layer 2: output 1 |       No        |
# +-----------+--------------+------+----------------------------+-----------------+
# 1. 'Connection' gives the indices of emulators and the associated output dimensions 
#    that are linked to the emulator referred by 'Layer No.' and 'Emulator No.'.
# 2. 'External Inputs' indicates if the emulator (referred by 'Layer No.' and 'Emulator 
#    No.') has external inputs that are not provided by the feeding emulators.
```

We now generate some testing data for the validation:

```r
set_seed(99) # for reproducibility 
validate_x <- maximinLHS(500,1) # testing input
validate_y <- f123(validate_x) # testing output
```

]

We can now validate the linked emulator using the generated validation data:

```r
m_link <- validate(m_link, validate_x, validate_y) # validate without plotting
p1 <- plot(m_link, validate_x, validate_y, type = 'line') # diagnosis plot type 1 
p2 <- plot(m_link, validate_x, validate_y, style = 2) # diagnosis plot type 2
p1 + p2 + plot_layout(guides = 'keep') # combine the two diagnosis plots
```

]

We can compare the performance of the linked emulator with those of GP and DGP emulators by generating GP and DGP emulators of `f123()` with the same number of training points: 5 initial training data points enriched by additional 25 data points.

]

---

# Additional Resources

---

### Where to get help & how to contribute

1. Check the documentation and tutorials at [https://mingdeyu.github.io/dgpsi-R/](https://mingdeyu.github.io/dgpsi-R/) for the released version, and [https://mingdeyu.github.io/dgpsi-R/dev/](https://mingdeyu.github.io/dgpsi-R/dev/) for the development version.
   
--
   
2. Check the changelog at [https://mingdeyu.github.io/dgpsi-R/dev/news/](https://mingdeyu.github.io/dgpsi-R/dev/news/) to see if your problem has been addressed in the development version.

--
   
3. Create an issue or report a bug at [https://github.com/mingdeyu/dgpsi-R/issues](https://mingdeyu.github.io/dgpsi-R/dev/news/).

4. Submit a pull request at [https://github.com/mingdeyu/dgpsi-R/pulls](https://github.com/mingdeyu/dgpsi-R/pulls).

5. Drop me an email at [deyu.ming.16@ucl.ac.uk](deyu.ming.16@ucl.ac.uk).

6. Have a look at the references:
 * [Ming, D. and Williamson, D. (2023) Linked deep Gaussian process emulation for model networks. arXiv:2306.01212](https://arxiv.org/abs/2306.01212)
 * [Ming, D., Williamson, D., and Guillas, S. (2023) Deep Gaussian process emulation using stochastic imputation. <i>Technometrics</i>. 65(2), 150-161.](https://doi.org/10.1080/00401706.2022.2124311)
 * [Ming, D. and Guillas, S. (2021) Linked Gaussian process emulation for systems of computer models using Mat&eacute;rn kernels and adaptive design, <i>SIAM/ASA Journal on Uncertainty Quantification</i>. 9(4), 1615-1642.](https://doi.org/10.1137/20M1323771)