# Bayesian Optimization PowerPoint Presentation, PPT - DocSlides 2017-11-05 43K 43 0 0

##### Description

(BO). Javad. . Azimi. Fall 2010. http://web.engr.oregonstate.edu/~azimi/. Outline. Formal Definition. Application. Bayesian Optimization Steps. Surrogate Function(Gaussian Process). Acquisition Function. ID: 602728

Embed code:

DownloadNote - The PPT/PDF document "Bayesian Optimization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

### Presentations text content in Bayesian Optimization

Slide1

Bayesian Optimization(BO)

Azimi

Fall 2010

http://web.engr.oregonstate.edu/~azimi/

Slide2

Outline

Formal Definition

Application

Bayesian Optimization Steps

Surrogate Function(Gaussian Process)

Acquisition Function

PMAX

IEMAX

MPI

MEI

UCB

GP-Hedge

Slide3

Formal Definition

Input: Goal:

Slide4

Fuel Cell Application

Anode

Cathode

bacteria

Oxidation products (CO

2

)

Fuel (organic matter)

e

-

e

-

O

2

H

2

O

H

+

This is how an MFC works

SEM image of bacteria sp.

on Ni

nanoparticle

enhanced carbon fibers

.

Nano

-structure of anode significantly impact the electricity production.

We should optimize anode

nano

-structure to maximize power by selecting a set of experiment.

4

Slide5

Big Picture

Since Running experiment is very expensive we use BO.

Select one experiment to run at a time based on results of previous experiments.

Current Experiments

Our Current Model

Select Single Experiment

Run Experiment

5

Slide6

BO Main Steps

Surrogate Function(Response Surface

,

Model)

Make a posterior over unobserved points based on the prior.

Its

parameter might be based on the prior. Remember it is a BAYESIAN approach.

Acquisition

Criteria(Function)

Which sample should be selected next.

Slide7

Surrogate Function

Simulates the

unknown function

distribution

based on the prior.

Deterministic (Classical Linear Regression,…)

There is a deterministic prediction for each point x in the input space.

Stochastic (Bayesian regression, Gaussian Process,…)

There is a distribution over the prediction for each point x in the input space. (

i.e

Normal distribution)

Example

Deterministic: f(x1)=y1, f(x2)=y2

Stochastic: f(x1)=N(y1,2) f(x2)=N(y2,5)

Slide8

Gaussian Process(GP)

A Gaussian process is a collection number of random variables, any finite number of which have a joint Gaussian distribution.Consistency requirement or marginalization property.Marginalization property:

Slide9

Gaussian Process(GP)

Formal prediction:Interesting points:Squared exponential function corresponds to Bayesian linear regression with an infinite number of basis function.Variance is independent from observationThe mean is a linear combination of observation.If the covariance function specifies the entries of covariance matrix, marginalization is satisfied!

Slide10

Gaussian Process(GP)

Gaussian Process is:An exact interpolating regression method.Predict the training data perfectly. (not true in classical regression)A natural generalization of linear regression.Nonlinear regression approach! A simple example of GP can be obtained from Bayesian regression.Identical resultsSpecifies a distribution over functions.

Slide11

Gaussian process(2):distribution over functions

95% confidence interval for each point x.

Three sampled functions

Slide12

Gaussian process(2):GP vs Bayesian regression

Bayesian regression:

Distribution over weight

The prior is defined over the weights.

Gaussian Process

Distribution over function

The prior is defined over the function space.

These are the same but from different view.

Slide13

Short Summary

Given any unobserved point z, we can define a normal distribution of its prediction value such that:

Its means is the linear combination of the observed value.

Its variance is related to its distance from observed value. (closer to observed data, less variance)

Slide14

BO Main Steps

Surrogate Function(Response Surface , Model)

Make a posterior over unobserved points based on the prior.

Its parameter might be based on the prior. Remember it is a BAYESIAN approach.

Acquisition Criteria(Function)

Which sample should be selected next.

Slide15

Bayesian Optimization:(Acquisition criterion)

Remember: we are looking for:Input: Set of observed data. A set of points with their corresponding mean and variance.Goal: Which point should be selected next to get to the maximizer of the function faster.Different Acquisition criterion(Acquisition functions or policies)

Slide16

Policies

Maximum Mean (MM).

Maximum Upper Interval (MUI).

Maximum Probability of Improvement (MPI).

Maximum Expected of Improvement (MEI).

Slide17

Policies:Maximum Mean (MM).

Returns the point with highest expected value.Advantage:If the model is stable and has been learnt very good, performs very good.Disadvantage:There is a high chance to fall in local minimum(just exploit).Can converge to global optimum finally?No 

Slide18

Policies:Maximum Upper Interval (MUI).

Returns the point with highest 95% upper interval.Advantage:Combination of mean and variance(exploitation and exploration).Disadvantage:Dominated by variance and mainly explore the input space. Can converge to global optimum finally?Yes.But needs almost infinite number of samples. 

Slide19

Policies:Maximum Probability of Improvement (MPI)

Selects the sample with highest probability of improving the current best observation (ymax) by some margins m.

Slide20

Policies:Maximum Probability of Improvement (MPI)

Advantage:Considers mean and variance and ymax in policy(smarter than MUI)Disadvantage:Ad-hoc parameter m Large value of m? ExplorationSmall value of m? Exploitation

Slide21

Policies:Maximum Expected of Improvement (MEI)

Maximum Expected of improvement. Question: Expectation over which variable?m 

Slide22

Policies:Upper Confidence Bounds

Select based on the variance and mean of each point.The selection of k left to the user.Recently, a principle approach to select this parameter has been proposed.

Slide23

Summary

MM

MUI

MPI

MEI

GP-UCB

Which one should be selected for an unknown model?

Slide24

GP-Hedge

GP-Hedge(2010) It select one of the baseline policy based on the theoretical results of multi-armed bandit problem, although the objective is a bit different! They show that they can perform better than (or as well as) the best baseline policy in some framework.

Slide25

Future Works

Method

selection smarter than GP-Hedge with theoretical analysis.

Batch Bayesian optimization.

Scheduling Bayesian optimization.

Slide26