class: middle ## _dgpsi_: An R Package for Emulation ### Deyu Ming <style type="text/css"> .scroll-output { max-height: 400px; padding-left: 20px; overflow-y: auto; } pre { min-height: 25px; max-height: 320px; padding-left: 20px; overflow-y: auto; } pre[class] { max-height: 200px; } .right-column{ padding-top: 0; } iframe { border: none; } .center2 { margin: 0; position: absolute; top: 25%; -ms-transform: translate(-50%, -50%); } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 14px; } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } .pull-left-narrow { padding-top: 0; float: left; width: 40%; } .pull-right-wide { padding-top: 0; float: right; width: 55%; } img {vertical-align: top;} </style> --- ## Agenda .pull-left[ **Basics** - An overview of the package **Installation** - How to install the package - How to activate the package **Function Overview** - An overview of functions available ] .pull-right[ **Try `dgpsi`** - A quick demo of the package **A Synthetic Case Study** - Model network emulation <!-- **JULES Emulator Demo** --> <!-- - Initial emulator --> <!-- - Talking to JULES for adaptive refinements --> **Additional Resources** - Where to get help - How to contribute ] --- class: center, middle # Basics --- ### An overview of the package `dgpsi` is an R package for deep and linked Gaussian process emulations. It currently implements: * Gaussian process emulations. * Deep Gaussian process emulations with: - multiple layers; - multiple GP nodes; - non-Gaussian likelihoods (Poisson, Negative-Binomial, and heteroskedastic Gaussian). * Linked emulations of model networks by linking (D)GP emulators. * Leave-One-Out (LOO) and Out-Of-Sample (OOS) validations for GP, DGP, and linked (D)GP emulators. * Sequential designs for (D)GP emulators and bundles of (D)GP emulators. --- class: center, middle # Installation --- ### How to install the package First of all, make sure you have the latest versions of R and RStuido installed on your machine. Then from RStudio you can install either the released version of the package from CRAN: ```r install.packages('dgpsi') ``` or the development version of the package from GitHub: ```r devtools::install_github('mingdeyu/dgpsi-R') ``` After the installation, run ```r library(dgpsi) init_py() ``` to install the required Python environment. The installation of the underlying Python environment only needs to be done once after you install or update the R package. --- ### How to activate the package You MUST run `init_py()` every time after `library(dgpsi)`, telling R to invoke the required Python environment: ```r library(dgpsi) init_py() ``` If you installed the development version and wanted to update the underlying Python implementation, you can reinstall the development Python version by: ```r init_py(reinstall = T) ``` If you experience issues while running `init_py()`, you can try to reinstall the Python environment: ```r init_py(reinstall = T) ``` or uninstall completely the Python environment: ```r init_py(uninstall = T) ``` and reinstall everything. --- class: center, middle # Function Overview --- ### An overview of functions available .panelset[ .panel[.panel-name[Emulation] Functions to train, validate, and make predictions from GP, DGP, and Linked (D)GP emulators. * .black[`gp()`]: Gaussian process emulator construction * .black[`dgp()`]: Deep Gaussian process emulator construction * .black[`lgp()`]: Linked (D)GP emulator construction * `continue()`: Continue the training of a DGP emulator * .black[`combine()`]: Combine layers * `predict()`: Predictions from GP, DGP, or linked (D)GP emulators * .black[`validate()`]: Validate a constructed GP, DGP, or linked (D)GP emulator * .black[`plot()`]: Validation plots of a constructed GP, DGP, or linked (D)GP emulator ] .panel[.panel-name[Sequential Design] Functions to implement sequential designs for (D)GP and bundles of (D)GP emulators. * .black[`design()`]: Sequential design of a (D)GP emulator or a bundle of (D)GP emulators * `alm()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using ALM * `mice()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using MICE * `pei()`: Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using PEI * `update()`: Update a GP or DGP emulator * .black[`draw()`]: Validation plots of a sequential design * `pack()`: Pack GP and DGP emulators into a bundle * `unpack()`: Unpack a bundle of (D)GP emulators ] .panel[.panel-name[Basic Nodes] Functions to customize structures of GP and DGP emulators. * `kernel()`: Initialize a Gaussian process node * `Hetero()`: Initialize a heteroskedastic Gaussian likelihood node * `Poisson()`: Initialize a Poisson likelihood node * `NegBin()`: Initialize a negative Binomial likelihood node ] .panel[.panel-name[Helpers] Some useful functions to manipulate the constructed emulators. * .black[`summary()`]: Summary of a constructed GP, DGP, or linked (D)GP emulator * .black[`write()`]: Save the constructed emulator * .black[`read()`]: Load the stored emulator * .black[`set_seed()`]: Random seed generator * .black[`set_linked_idx()`]: Set linked indices * `set_imp()`: Reset number of imputations for a DGP emulator * `window()`: Trim the sequences of model parameters of a DGP emulator * `trace_plot()`: Plot of DGP model parameter traces * `nllik()`: Calculate negative predicted log-likelihood ] ] --- class: center, middle # Try dgpsi --- ### A quick demo of the package .panelset[ .panel[.panel-name[Problem] We aim to build a GP emulator of an 1-D synthetic simulator `f` over `[0,1]`. ```r library(ggplot2) f <- function(x) { sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4)) } # Simulator # Visualise the simulator x <- seq(0, 1, length = 200) dat <- data.frame('x' = x, 'y' = f(x)) ggplot(data = dat, aes(x = x, y = y)) + geom_line(color = 'dodgerblue2') ``` <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> ] .panel[.panel-name[Initialisation] We first load the required packages and generate 5 training points from the simulator: ```r library(lhs) # for latin-hypercube sampling library(patchwork) # for visualisation library(dgpsi) # for emulation init_py() # alway run this function to invoke the required Python env set_seed(99) # set a random seed for reproducibility X <- maximinLHS(5,1) # training input Y <- f(X) # training output ``` We also generate a testing dataset for the validation ```r validate_x <- maximinLHS(500,1) # testing input validate_y <- f(validate_x) # testing output ``` We then initialise a GP emulator: ```r m_gp <- gp(X, Y) # GP emulation ``` ``` ## Auto-generating a GP structure ... done ## Initializing the GP emulator ... done ## Training the GP emulator ... done ``` ] .panel[.panel-name[Refinement] We now refine the GP emulator by adaptively adding 35 additional data points: ```r m_gp <- design(m_gp, N = 35, limits = c(0,1), f = f, x_test = validate_x, y_test = validate_y) ``` ``` ## Initializing ... done ## * RMSE: 0.532536 ## Iteration 1: ## - Locating ... done ## * Next design point: 0.002443 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.475652 ## Iteration 2: ## - Locating ... done ## * Next design point: 0.390530 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.482027 ## Iteration 3: ## - Locating ... done ## * Next design point: 0.820480 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.482385 ## Iteration 4: ## - Locating ... done ## * Next design point: 0.990126 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.483036 ## Iteration 5: ## - Locating ... done ## * Next design point: 0.611982 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.488943 ## Iteration 6: ## - Locating ... done ## * Next design point: 0.318553 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.238480 ## Iteration 7: ## - Locating ... done ## * Next design point: 0.874000 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.238344 ## Iteration 8: ## - Locating ... done ## * Next design point: 0.766622 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.238232 ## Iteration 9: ## - Locating ... done ## * Next design point: 0.180834 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.221539 ## Iteration 10: ## - Locating ... done ## * Next design point: 0.483249 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.214417 ## Iteration 11: ## - Locating ... done ## * Next design point: 0.444714 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.219416 ## Iteration 12: ## - Locating ... done ## * Next design point: 0.072204 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.149644 ## Iteration 13: ## - Locating ... done ## * Next design point: 0.655350 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.149650 ## Iteration 14: ## - Locating ... done ## * Next design point: 0.282643 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.132904 ## Iteration 15: ## - Locating ... done ## * Next design point: 0.571662 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.132212 ## Iteration 16: ## - Locating ... done ## * Next design point: 0.037788 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.132003 ## Iteration 17: ## - Locating ... done ## * Next design point: 0.359289 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.132862 ## Iteration 18: ## - Locating ... done ## * Next design point: 0.731314 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.133663 ## Iteration 19: ## - Locating ... done ## * Next design point: 0.902411 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.134184 ## Iteration 20: ## - Locating ... done ## * Next design point: 0.209010 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.134244 ## Iteration 21: ## - Locating ... done ## * Next design point: 0.100697 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.081932 ## Iteration 22: ## - Locating ... done ## * Next design point: 0.415692 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.070921 ## Iteration 23: ## - Locating ... done ## * Next design point: 0.794646 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.071541 ## Iteration 24: ## - Locating ... done ## * Next design point: 0.968340 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.072301 ## Iteration 25: ## - Locating ... done ## * Next design point: 0.846047 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.073745 ## Iteration 26: ## - Locating ... done ## * Next design point: 0.153288 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.067023 ## Iteration 27: ## - Locating ... done ## * Next design point: 0.505460 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.066240 ## Iteration 28: ## - Locating ... done ## * Next design point: 0.634273 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.065525 ## Iteration 29: ## - Locating ... done ## * Next design point: 0.260127 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.064523 ## Iteration 30: ## - Locating ... done ## * Next design point: 0.683958 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.063323 ## Iteration 31: ## - Locating ... done ## * Next design point: 0.553927 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.062111 ## Iteration 32: ## - Locating ... done ## * Next design point: 0.335745 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.060928 ## Iteration 33: ## - Locating ... done ## * Next design point: 0.019728 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.025229 ## Iteration 34: ## - Locating ... done ## * Next design point: 0.462833 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.025892 ## Iteration 35: ## - Locating ... done ## * Next design point: 0.917233 ## - Updating and re-fitting ... done ## - Validating ... done ## * RMSE: 0.026677 ``` ] .panel[.panel-name[Validation] We can then validate, save, and load the refined GP emulator: ```r p1 <- draw(m_gp, log = T) # check the error trace during the sequential design p2 <- plot(m_gp, validate_x, validate_y, type = 'line') # check the fit p1 + p2 + plot_layout(guides = 'keep') ``` <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> ```r write(m_gp, 'm_gp_emulator') # save the GP emulator m_gp <- read('m_gp_emulator') # load the saved GP emulator ``` ] .panel[.panel-name[Exercise] Try to reproduce the following results by building a DGP emulator adaptively with `dgp()` and `design()` using a random seed at `99`. **What do you observe in comparison to the results from the GP emulator?** <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" /> ] ] --- class: center, middle # A Synthetic Case Study --- ### Model network emulation .panelset[ .panel[.panel-name[Problem] We aim to build a linked emulator of a feed-forward synthetic model network formed by three 1-D models defined over `[0,1]`. ```r # Component 1 f1 <- function(x) { (sin(7.5*x)+1)/2 } # Component 2 f2 <- function(x) { 1/3*sin(2*(2*x - 1))+2/3*exp(-30*(2*(2*x-1))^2)+1/3 } # Component 3 f3 <- function(x) { (sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))+1)/2 } # Network f123 <- function(x) { f3(f2(f1(x))) } ``` ] .panel[.panel-name[Emulation] 1. Generate 5 training data points and 500 validation data points from `f1()`, `f2()` and `f3()`. 2. Construct a GP emulator `m1` of `f1()`, and DGP emulators `m2` and `m3` of `f2()` and `f3()` respectively with the initial 5 training data points. 3. Refine the emulators with sequential designs by adding 25 more training data points. 4. Validate the emulators with the validation data. <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-20-1.svg" style="display: block; margin: auto;" /> ] .panel[.panel-name[Linking] After the emulators of individual components are built and validated. We can use `set_linked_idx()` to specify the connections among the emulators: ```r # f1 takes all global inputs m1 <- set_linked_idx(m1, c(1)) # f2 takes everything created by f1 m2 <- set_linked_idx(m2, c(1)) # f3 takes all outputs from f2 as its inputs m3 <- set_linked_idx(m3, c(1)) ``` We then can link the three emulators by `combine()` and feed it into `lgp()` to construct the linked emulator of `f123()`: ```r # combine the three individual emulators struc <- combine( list(m1), list(m2), list(m3) ) # construct the linked emulator m_link <- lgp(struc) ``` ] .panel[.panel-name[Inspection] We can inspect and double-check the connections of emulators contained in the constructed linked emulator `m_link` by `summary()`: ```r # inspect the connections of emulators in the linked emulator summary(m_link) ``` ``` # +-----------+--------------+------+----------------------------+-----------------+ # | Layer No. | Emulator No. | Type | Connection | External Inputs | # +-----------+--------------+------+----------------------------+-----------------+ # | Layer 1 | Emu 1 | GP | Global input: [1] | No | # | Layer 2 | Emu 1 | DGP | Emu 1 in Layer 1: output 1 | No | # | Layer 3 | Emu 1 | DGP | Emu 1 in Layer 2: output 1 | No | # +-----------+--------------+------+----------------------------+-----------------+ # 1. 'Connection' gives the indices of emulators and the associated output dimensions # that are linked to the emulator referred by 'Layer No.' and 'Emulator No.'. # 2. 'External Inputs' indicates if the emulator (referred by 'Layer No.' and 'Emulator # No.') has external inputs that are not provided by the feeding emulators. ``` We now generate some testing data for the validation: ```r set_seed(99) # for reproducibility validate_x <- maximinLHS(500,1) # testing input validate_y <- f123(validate_x) # testing output ``` ] .panel[.panel-name[Validation] We can now validate the linked emulator using the generated validation data: ```r m_link <- validate(m_link, validate_x, validate_y) # validate without plotting p1 <- plot(m_link, validate_x, validate_y, type = 'line') # diagnosis plot type 1 p2 <- plot(m_link, validate_x, validate_y, style = 2) # diagnosis plot type 2 p1 + p2 + plot_layout(guides = 'keep') # combine the two diagnosis plots ``` <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" /> ] .panel[.panel-name[Comparison] We can compare the performance of the linked emulator with those of GP and DGP emulators by generating GP and DGP emulators of `f123()` with the same number of training points: 5 initial training data points enriched by additional 25 data points. <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-26-1.svg" style="display: block; margin: auto;" /> ] ] <!-- --- --> <!-- class: center, middle --> <!-- # JULES Emulator Demo --> <!-- --- --> <!-- ### Initial emulator --> <!-- ![](data:image/png;base64,#gif_db.gif) --> <!-- --- --> <!-- ### Talking to JULES for adaptive refinements --> <!-- ```{r, echo=FALSE, out.width="90%", fig.align='center', eval=F} --> <!-- knitr::include_graphics("bash.gif") --> <!-- ``` --> --- class: center, middle # Additional Resources --- ### Where to get help & how to contribute 1. Check the documentation and tutorials at [https://mingdeyu.github.io/dgpsi-R/](https://mingdeyu.github.io/dgpsi-R/) for the released version, and [https://mingdeyu.github.io/dgpsi-R/dev/](https://mingdeyu.github.io/dgpsi-R/dev/) for the development version. -- 2. Check the changelog at [https://mingdeyu.github.io/dgpsi-R/dev/news/](https://mingdeyu.github.io/dgpsi-R/dev/news/) to see if your problem has been addressed in the development version. -- 3. Create an issue or report a bug at [https://github.com/mingdeyu/dgpsi-R/issues](https://mingdeyu.github.io/dgpsi-R/dev/news/). -- 4. Submit a pull request at [https://github.com/mingdeyu/dgpsi-R/pulls](https://github.com/mingdeyu/dgpsi-R/pulls). -- 5. Drop me an email at [deyu.ming.16@ucl.ac.uk](deyu.ming.16@ucl.ac.uk). -- 6. Have a look at the references: * [Ming, D. and Williamson, D. (2023) Linked deep Gaussian process emulation for model networks. arXiv:2306.01212](https://arxiv.org/abs/2306.01212) * [Ming, D., Williamson, D., and Guillas, S. (2023) Deep Gaussian process emulation using stochastic imputation. <i>Technometrics</i>. 65(2), 150-161.](https://doi.org/10.1080/00401706.2022.2124311) * [Ming, D. and Guillas, S. (2021) Linked Gaussian process emulation for systems of computer models using Matérn kernels and adaptive design, <i>SIAM/ASA Journal on Uncertainty Quantification</i>. 9(4), 1615-1642.](https://doi.org/10.1137/20M1323771)