Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on categorical criteria with classification surrogates. Essentially, we want to account for scenarios where specialists can look at a set of experiments and label outcomes as ‘acceptable’, ‘unacceptable’, ‘ideal’, etc.

This involves new models that produce CategoricalOutput’s rather than continuous outputs. Mathematically, if \(g_{\theta}:\mathbb{R}^d\to[0,1]^c\) represents the function governed by learnable parameters \(\theta\) which outputs a probability vector over \(c\) potential classes (i.e. for input \(x\in\mathbb{R}^d\), \(g_{\theta}(x)^\top\mathbf{1}=1\) where \(\mathbf{1}\) is the vector of all 1’s) and we have acceptibility criteria for the corresponding classes given by \(a\in\{0,1\}^c\), we can compute the scalar output \(g_{\theta}(x)^\top a\in[0,1]\) which represents the expected value of acceptance as an objective value to be passed in as a constrained function.

In this script, we look at the Rosenbrock function constrained to a disk which attains a global minima at \((x_0^*,x_1^*)=(1.0, 1.0)\). To facilitate testing the functionality offered by BoFire, we label all points inside of the circle \(x_0^2+x_1^2\le2\) as ‘acceptable’ and further label anything inside of the intersection of this circle and the circle \((x_0-1)^2+(x_1-1)^2\le1.0\) as ‘ideal’; points lying outside of these two locations are labeled as “unacceptable.”

# Import packages
import pandas as pd

import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Inputs, Outputs
from bofire.data_models.features.api import (
    CategoricalInput,
    CategoricalOutput,
    ContinuousInput,
    ContinuousOutput,
)
from bofire.data_models.objectives.api import (
    ConstrainedCategoricalObjective,
    MinimizeObjective,
)

Manual setup of the optimization domain

The following cells show how to manually setup the optimization problem in BoFire for didactic purposes.

# Write helper functions which give the objective and the constraints
def rosenbrock(x: pd.Series) -> pd.Series:
    assert "x_0" in x.columns
    assert "x_1" in x.columns
    return (1 - x["x_0"]) ** 2 + 100 * (x["x_1"] - x["x_0"] ** 2) ** 2


def constraints(x: pd.Series) -> pd.Series:
    assert "x_0" in x.columns
    assert "x_1" in x.columns
    feasiblity_vector = []
    for _, row in x.iterrows():
        if (row["x_0"] ** 2 + row["x_1"] ** 2 <= 2.0) and (
            (row["x_0"] - 1.0) ** 2 + (row["x_1"] - 1.0) ** 2 <= 1.0
        ):
            feasiblity_vector.append("ideal")
        elif row["x_0"] ** 2 + row["x_1"] ** 2 <= 2.0:
            feasiblity_vector.append("acceptable")
        else:
            feasiblity_vector.append("unacceptable")
    return feasiblity_vector

# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(
    features=[ContinuousInput(key=f"x_{i}", bounds=(-1.75, 1.75)) for i in range(2)]
    + [CategoricalInput(key="x_3", categories=["0", "1"], allowed=[True, True])],
)

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(
    features=[
        ContinuousOutput(key="f_0", objective=MinimizeObjective(w=1.0)),
        CategoricalOutput(
            key="f_1",
            categories=["unacceptable", "acceptable", "ideal"],
            objective=ConstrainedCategoricalObjective(
                categories=["unacceptable", "acceptable", "ideal"],
                desirability=[False, True, True],
            ),
        ),  # This function will be associated with learning the categories
    ],
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(100)

# Write a function which outputs one continuous variable and another discrete based on some logic
sample_df["f_0"] = rosenbrock(x=sample_df)
sample_df["f_1"] = constraints(x=sample_df)

sample_df.head(5)

	x_0	x_1	x_3	f_0	f_1
0	0.618811	0.101391	1	8.071527	ideal
1	0.593479	-0.379804	1	53.750856	acceptable
2	1.478268	-1.091084	1	1073.682046	unacceptable
3	-0.276538	-0.413598	1	25.646532	acceptable
4	1.714201	-0.825341	1	1417.148345	unacceptable

# Plot the sample df
import math

import plotly.express as px


fig = px.scatter(
    sample_df,
    x="x_0",
    y="x_1",
    color="f_1",
    width=550,
    height=525,
    title="Samples with labels",
)
fig.add_shape(
    type="circle",
    xref="x",
    yref="y",
    opacity=0.1,
    fillcolor="red",
    x0=-math.sqrt(2),
    y0=-math.sqrt(2),
    x1=math.sqrt(2),
    y1=math.sqrt(2),
    line_color="red",
)
fig.add_shape(
    type="circle",
    xref="x",
    yref="y",
    opacity=0.2,
    fillcolor="LightSeaGreen",
    x0=0,
    y0=0,
    x1=2,
    y1=2,
    line_color="LightSeaGreen",
)
fig.show()

Evaluate the classification model performance (outside of the optimization procedure)

# Import packages
import bofire.surrogates.api as surrogates
from bofire.data_models.surrogates.api import ClassificationMLPEnsemble
from bofire.surrogates.diagnostics import ClassificationMetricsEnum


# Instantiate the surrogate data model
surrogate_data = ClassificationMLPEnsemble(
    inputs=domain1.inputs,
    outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")]),
    lr=0.03,
    n_epochs=100,
    hidden_layer_sizes=(
        4,
        2,
    ),
    weight_decay=0.0,
    batch_size=10,
    activation="tanh",
)
surrogate = surrogates.map(surrogate_data)

# Fit the surrogate to the classification data
cv_df = sample_df.drop(["f_0"], axis=1)
cv_df["valid_f_1"] = 1
cv_train, cv_test, _ = surrogate.cross_validate(cv_df, folds=3)

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

# Print training performance
cv_train.get_metrics(
    metrics=ClassificationMetricsEnum,
    combine_folds=True,
)

	ACCURACY	F1
0	0.8	0.8

# Print test performance
cv_test.get_metrics(
    metrics=ClassificationMetricsEnum,
    combine_folds=True,
)

	ACCURACY	F1
0	0.62	0.62

Setup strategy and ask for candidates

Now we setup a SoboStrategy for generating candidates, the categorical output is modelled using the surrogate from above. The categorical output is modelled as an output constraint in the acquistion function optimization (constrained expected improvement). For more details have a look at this notebook: https://github.com/pytorch/botorch/blob/main/notebooks_community/clf_constrained_bo.ipynb and/or this paper: https://arxiv.org/abs/2402.07692.

from bofire.data_models.acquisition_functions.api import qLogEI
from bofire.data_models.strategies.api import SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates


strategy_data = SoboStrategy(
    domain=domain1,
    acquisition_function=qLogEI(),
    surrogate_specs=BotorchSurrogates(
        surrogates=[surrogate_data],
    ),
)

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)

candidates = strategy.ask(10)
candidates

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/optim/optimize.py:789: RuntimeWarning:

Optimization failed in `gen_candidates_scipy` with the following warning(s):
[RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), OptimizationWarning('Optimization failed within `scipy.optimize.minimize` with status 2 and message ABNORMAL: .'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.')]
Trying again with a new set of initial conditions.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/optim/optimize.py:789: RuntimeWarning:

Optimization failed in `gen_candidates_scipy` with the following warning(s):
[RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), OptimizationWarning('Optimization failed within `scipy.optimize.minimize` with status 2 and message ABNORMAL: .'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.')]
Trying again with a new set of initial conditions.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/optim/optimize.py:789: RuntimeWarning:

Optimization failed in `gen_candidates_scipy` with the following warning(s):
[RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), OptimizationWarning('Optimization failed within `scipy.optimize.minimize` with status 2 and message ABNORMAL: .'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.')]
Trying again with a new set of initial conditions.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/optim/optimize.py:789: RuntimeWarning:

Optimization failed on the second try, after generating a new set of initial conditions.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/optim/optimize.py:789: RuntimeWarning:

Optimization failed in `gen_candidates_scipy` with the following warning(s):
[RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), OptimizationWarning('Optimization failed within `scipy.optimize.minimize` with status 2 and message ABNORMAL: .'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.'), RuntimeWarning('Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.')]
Trying again with a new set of initial conditions.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/botorch/models/ensemble.py:82: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/torch/nn/modules/module.py:2910: RuntimeWarning:

Could not update `train_inputs` with transformed inputs since _MLPEnsemble does not have a `train_inputs` attribute. Make sure that the `input_transform` is applied to both the train inputs and test inputs.

	x_0	x_1	x_3	f_1_pred	f_1_unacceptable_prob	f_1_acceptable_prob	f_1_ideal_prob	f_0_pred	f_1_unacceptable_sd	f_1_acceptable_sd	f_1_ideal_sd	f_0_sd	f_0_des	f_1_des
0	1.316191	1.750000	0	acceptable	0.695481	0.699399	0.001531	0.299069	5.623754	0.411179	0.002578	0.409451	-0.299069	0.700931
1	0.232603	0.053965	1	ideal	0.632017	0.002901	0.996616	0.000483	3.704230	0.006098	0.005842	0.000568	-0.000483	0.999517
2	1.321554	1.750000	1	acceptable	-0.850556	0.798475	0.200020	0.001505	4.712167	0.446082	0.446818	0.001643	-0.001505	0.998495
3	0.749913	0.552838	0	unacceptable	0.895784	0.254828	0.189669	0.555504	3.693703	0.425035	0.415857	0.513569	-0.555504	0.444496
4	1.371457	0.065610	0	unacceptable	329.812476	0.645057	0.097248	0.257695	3.851841	0.490297	0.183952	0.355273	-0.257695	0.742305
5	0.941995	0.869749	1	acceptable	0.413341	0.792195	0.201271	0.006534	3.710855	0.442905	0.446376	0.011905	-0.006534	0.993466
6	-0.007442	0.001388	0	ideal	0.863852	0.019241	0.980122	0.000637	3.757818	0.042619	0.043382	0.000810	-0.000637	0.999363
7	0.436914	0.186128	0	ideal	0.723683	0.215423	0.745996	0.038582	3.743774	0.439556	0.432648	0.085524	-0.038582	0.961418
8	-0.073056	0.005392	1	ideal	0.929786	0.002798	0.996912	0.000291	3.759591	0.005873	0.005715	0.000323	-0.000291	0.999709
9	1.124869	-1.389174	1	unacceptable	704.856086	0.799505	0.200183	0.000312	3.867517	0.446644	0.446748	0.000458	-0.000312	0.999688

Check classification of proposed candidates

Use the logic from above to verify the classification values

# Append to the candidates
candidates["f_1_true"] = constraints(x=candidates)

# Print results
candidates[["x_0", "x_1", "f_1_pred", "f_1_true"]]

	x_0	x_1	f_1_pred	f_1_true
0	1.316191	1.750000	acceptable	unacceptable
1	0.232603	0.053965	ideal	acceptable
2	1.321554	1.750000	acceptable	unacceptable
3	0.749913	0.552838	unacceptable	ideal
4	1.371457	0.065610	unacceptable	acceptable
5	0.941995	0.869749	acceptable	ideal
6	-0.007442	0.001388	ideal	acceptable
7	0.436914	0.186128	ideal	ideal
8	-0.073056	0.005392	ideal	acceptable
9	1.124869	-1.389174	unacceptable	unacceptable