Skip to content

Surrogate models

In Bayesian Optimization, information from previous experiments is taken into account to generate proposals for future experiments. This information is leveraged by creating a surrogate model for the black-box function that is to be optimized based on the available data. Naturally, experimental candidates for which the surrogate model makes a promising prediction (e.g., high predicted values of a quantity we want to maximize) should be chosen over ones for which this is not the case. However, since the available data might cover only a small part of the input space, the model is likely to only be able to make very uncertain predictions far away from the data. Therefore, the surrogate model should be able to express the degree to which the predictions are uncertain so that we can use this information - combining the prediction and the associated uncertainty - to select the settings for the next experimental iteration.

The acquisition function is the object that turns the predicted distribution (you can think of this as the prediction and the prediction uncertainty) into a single quantity representing how promising a candidate experimental point seems. This function determines if one rather wants to focus on exploitation, i.e., quickly approaching a close local optimum of the black-box function, or on exploration, i.e., exploring different regions of the input space first.

Therefore, three criteria typically determine whether any candidate is selected as experimental proposal: the value of the surrogate model, the uncertainty of the model, and the acquisition function.

Surrogate model options

BoFire offers the following classes of surrogate models.

Surrogate Optimization of When to use Type
SingleTaskGPSurrogate a single objective with real valued inputs Limited data and black-box function is smooth Gaussian process
RandomForestSurrogate a single objective Rich data; black-box function does not have to be smooth sklearn random forest implementation
MLP a single objective with real-valued inputs Rich data and black-box function is smooth Multi layer perceptron
MixedSingleTaskGPSurrogate a single objective with categorical and real valued inputs Limited data and black-box function is smooth Gaussian process
XGBoostSurrogate a single objective Rich data; black-box function does not have to be smooth xgboost implementation of gradient boosting trees
TanimotoGP a single objective At least one input feature is a molecule represented as fingerprint Gaussian process on a molecule space for which Tanimoto similarity determines the similarity between points

All of these are single-objective surrogate models. For optimization of multiple objectives at the same time, a suitable Strategy has to be chosen. Then for each objective a different surrogate model can be specified. By default the SingleTaskGPSurrogate is used.

Example:

surrogate_data_0 = SingleTaskGPSurrogate(
        inputs=domain.inputs,
        outputs=Outputs(features=[domain.outputs[0]]),
)
surrogate_data_1 = XGBoostSurrogate(
    inputs=domain.inputs,
    outputs=Outputs(features=[domain.outputs[1]]),
)
qparego_data_model = QparegoStrategy(
    domain=domain,
    surrogate_specs=BotorchSurrogates(
        surrogates=[surrogate_data_0, surrogate_data_1]
    ),
)

Note:

  • The standard Kernel for all Gaussian Process (GP) surrogates is a 5/2 matern kernel with automated relevance detection and normalization of the input features.
  • The tree-based models (RandomForestSurrogate and XGBoostSurrogate) do not have kernels but quantify uncertainty using the standard deviation of the predictions of their individual trees.
  • MLP quantifies uncertainty using the standard deviation of multiple predictions that come from different dropout rates (randomly setting neural network weights to zero).

Customization

BoFire also offers the option to customize surrogate models. In particular, it is possible to customize the SingleTaskGPSurrogate in the following ways.

Kernel customization

Specify the Kernel:

Kernel Description Translation invariant Input variable type
RBFKernel Based on Gaussian distribution Yes Continuous
MaternKernel Based on Gamma function; allows setting a smoothness parameter Yes Continuous
PolynomialKernel Based on dot-product of two vectors of input points No Continuous
LinearKernel Equal to dot-product of two vectors of input points No Continuous
TanimotoKernel Measures similarities between binary vectors using Tanimoto Similiarity Not applicable MolecularInput
HammingDistanceKernel Similarity is defined by the Hamming distance which considers the number of equal entries between two vectors (e.g., in One-Hot-encoding) Not applicable Categorical

Translational invariance means that the similarity between two input points is not affected by shifting both points by the same amount but only determined by their distance. Example: with a translationally invariant kernel, the values 10 and 20 are equally similar to each other as the values 20 and 30, while with a polynomial kernel the latter pair has potentially higher similarity. Polynomial kernels are often suitable for high-dimensional inputs while for low-dimensional inputs an RBF or Matérn kernel is recommended.

Note: - SingleTaskGPSurrogate with PolynomialKernel is equivalent to PolynomialSurrogate. - SingleTaskGPSurrogate with LinearKernel is equivalent to LinearSurrogate. - SingleTaskGPSurrogate with TanimotoKernel is equivalent to TanimotoGP. - One can combine two Kernels by using AdditiveKernel or MultiplicativeKernel.

Example:

surrogate_data_0 = SingleTaskGPSurrogate(
        inputs=domain.inputs,
        outputs=Outputs(features=[domain.outputs[0]]),
        kernel=PolynomialKernel(power=2)
)

Noise model customization

For experimental data subject to noise, one can specify the distribution of this noise. The options are:

Noise Model When to use
NormalPrior Noise is Gaussian
GammaPrior Noise has a Gamma distribution

Example:

surrogate_data_0 = SingleTaskGPSurrogate(
        inputs=domain.inputs,
        outputs=Outputs(features=[domain.outputs[0]]),
        kernel=PolynomialKernel(power=2),
        noise_prior=NormalPrior(loc=0, scale=1)
)