Keith Dalbey PhD Sandia National Labs Dept 1441 Optimization amp Uncertainty Quantification Abani K Patra PhD Department of Mechanical amp Aerospace Engineering University at Buffalo ID: 685091
Download Presentation The PPT/PDF document "Ensemble Emulation Feb. 28 – Mar. 4, ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Ensemble Emulation
Feb. 28 – Mar. 4, 2011
Keith Dalbey, PhD
Sandia National Labs, Dept 1441Optimization & Uncertainty QuantificationAbani K. Patra, PhD Department of Mechanical & Aerospace Engineering, University at BuffaloMatthew D. Jones, PhDCenter for Computational Research, University at BuffaloEliza S. Calder, PhDDepartment of Geology, University at Buffalo
Sandia is a
multiprogram
laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94AL85000.Slide2
OutlineEmulationBayesian EmulatorsEnsemble Emulators
Test Problem: Volcanic Hazard Map Approach: 3-Level Hierarchical Emulator ResultsConclusionsSlide3
EmulationAlso known as “meta-modeling” Process of creating a fast surrogate for simulator or physical system from limited amount of data
using surrogate in place of simulator for some purpose (e.g. optimization or uncertainty quantification) Can be as simple as a least squares fit Can be significantly more complexSlide4
Also known asGaussian Process EmulatorsBayes Linear MethodKriging“BLUP” or “BLUE”
Differences among them are minor. All have:unadjusted mean (frequently a least squares fit)correction/adjustment to mean based on dataestimated distribution about adjusted mean of possible true surfaces
Bayesian EmulatorsSlide5
Bayesian EmulatorsAlso known asGaussian Process EmulatorsBayes Linear Method
Kriging“BLUP” or “BLUE”Differences among them are minor & include:Choice of “error model” e.g. whether to restrict (“vertical”) distribution about adjusted mean to the normal distributionMethod of parameter selectionSlide6
Bayesian EmulatorsThe equations for the most common formulation are:6 of 32Slide7
Bayesian Emulator Parameter Selection
Always involves repeated inversion of error model’s “correlation matrix,” RR is an N x N matrix, where N is the number of data points
Requirement of matrix inversion restricts emulators to small amounts of data because, for “Large” N:R is poorly conditioned (numerically singular) Cost of inverting matrix is O(N3) operationsSlide8
Ensemble Emulation1,2Uses an ensemble of many small component emulators instead of 1 large emulator
Component emulators use small subsets of dataBenefits:Avoids problem of ill conditioningCan greatly reduce computational costAllows concurrent construction & concurrent evaluation of component emulators Macro emulator is non-stationary
Gramacy et al 2004
Dalbey, PhD 2009Slide9
Ensemble Emulation:
1D ExampleSlide10
Tessellate sample inputs & generate 2 hop neighborhood for each sampleConcurrently build mini-emulator for each sample’s
2 hop neighborhoodConcurrently evaluate mini-emulator nodes of “triangles” containing re-sample pointsThe (non-stationary) macro-emulator’s output is the weighted (by barycentric coordinates) sum of mini-emulator outputs
10 of 32
Ensemble Emulation O(N3)O(N M3)Slide11
Objective: in <24 hours use 1024 processors to generate map of probability that a (volcanic landslide) hazard criteria will be exceeded within 10 years for the island of Montserrat.2 uncertain input dimensions (volcanic flow volume and preferred initial direction)+2 spatial dimensions (East, North)
= 4 input dimensionsNeeds hundreds to thousands of simulations; each will produce a field variable (O(10^5) data points) as output.Each simulation takes O(10) processor hours
Test Problem:
Volcanic Hazard MapSlide12
Used “top down” 3-level hierarchical ensemble emulatorReplaced global N-by-N R matrix with N local M-by-M R
matrices, N is in millions, M is O(100) … This reduced cost from O(N
3) to O(N M3)Distributed work to nodes of supercomputer
Generated hazard map in under 9 hours using 1024 processors; goal was 24 hoursApproach12 of 32Slide13
3-LevelHierarchical Emulator
A particular simplex in the tessellation of the uncertain inputs.
Mini-Emulators A, B, & C have different spatial tessellations.13 of 32Slide14
3-Level Hierarchical Emulator
Emulator’s inputs are the tensor product of simulation output’s physical spatial dimensions & stochastic inputs
Error model is correlated through all
emulator inputs14 of 32Slide15
Work Flow: 3 Stages15 of 32Slide16
Hierarchical Emulator Results 16 of 32Slide17
Hierarchical Emulator Results
Hazard Map: Volcanic Island of Montserrat
17 of 32Slide18
ConclusionsReplacing single global emulator built from N points with ensemble of N component emulators built from M pointsChanges build cost from
O(N^3) to O(N M^3) operations, if N=O(106) & M=O(100) this is O(106) reduction
Avoids problem of ill-conditioned correlation matrixAllows ensemble “macro-emulator” to be non-stationaryAllows for concurrent construction & concurrent evaluation of component emulators (embarrassingly parallel) Allows data storage requirements to be distributed among nodes of commodity cluster supercomputerHas the same degree of smoothness/continuity