Domains
The BoFire domain contains all information about the optimization problem. In general, the domain is defined by inputs, outputs, objectives and constraints. Different types for each of these elements are implemented in the BoFire framework. A brief description of the usage of domains and the individual elements is given in the following.
A basic example of a domain consists of two continuous inputs \(x_1\), \(x_2\) and a continuous output \(y\). If no objective is given, a maximization objective for the output is assumed. If no constraints are defined, an empty set of constraints is assumed.
A continuous input can be defined using the ContinuousInput
class, which requires the variable name and its bounds. Please note that unbounded and partially bounded input variables are currently not supported. Here, we assume \(x_1, x_2 \in [0,1]\).
from bofire.data_models.features.api import ContinuousInput
from bofire.data_models.domain.api import Inputs
inputs = Inputs(features=[
ContinuousInput(key="x1", bounds=[0,1]),
ContinuousInput(key="x2", bounds=[0,1])
]
)
ContinuousOutput
class.
from bofire.data_models.features.api import ContinuousOutput
from bofire.data_models.domain.api import Outputs
outputs = Outputs(features=[
ContinuousOutput(key="y")
]
)
from bofire.data_models.domain.api import Domain
domain = Domain(
inputs=inputs,
outputs=outputs
)
from_lists
method of the Domain
class. This is especially useful if you want to define a domain without having to create separate lists for inputs and outputs. The following code achieves the same result as above:
domain = Domain.from_lists(
inputs=[
ContinuousInput(key="x1", bounds=[0,1]),
ContinuousInput(key="x2", bounds=[0,1])
],
outputs=[
ContinuousOutput(key="y")
]
)
Let us now assume, we have an additional linear equality constraint and we want to use a custom objective, e.g. a minimization objective for \(y\). The linear constraint can be defined using the LinearEqualityConstraint
class
from bofire.data_models.constraints.api import LinearEqualityConstraint
constraints = [LinearEqualityConstraint(features=["x1", "x2"], coefficients=[1,1], rhs=1)]
In BoFire, the minimization objective can be selected by setting the objective-attribute of an output to an instance of MinimizeObjective
from bofire.data_models.objectives.api import MinimizeObjective
outputs = [ContinuousOutput(key="y", objective=MinimizeObjective())]
domain = Domain(
inputs=inputs,
outputs=outputs,
constraints=constraints
)
Inputs
BoFire allows for the following different user-facing input classes:
ContinuousInput
DiscreteInput
CategoricalInput
MolecularInput
CategoricalDescriptorInput
CategoricalMolecularInput
TaskInput
Each of these input classes can be used for defining domains, however some strategies only support a subset of the available input types. You can call the is_feature_implemented()
function of a given strategy and input class to check whether the input is supported by the strategy. For example, the following code checks whether the ContinuousInput
is supported by the RandomSearch
strategy.
from bofire.data_models.strategies.api import RandomStrategy
RandomStrategy.is_feature_implemented(ContinuousInput)
True
if the input type is supported by the strategy, otherwise it will return False
. This is useful to check whether a certain input type can be used with a specific strategy before defining the domain.
Input types
Continuous inputs
Continuous inputs are used to define real-valued input variables with finite upper and lower bounds (see example above).
Discrete inputs
For discrete inputs, only a finite set of values is allowed. The DiscreteInput
class requires a list of permissible values as input. For example, the following code defines a new discrete input variable \(x_3\) with values \(0, 0.1, 0.2\).
from bofire.data_models.features.api import DiscreteInput
DiscreteInput(key="x3", values=[0, 0.1, 0.2])
Categorical inputs
This class of inputs is similar to the discrete inputs, but takes a list of strings as input. The following code defines a new categorical input variable \(x_4\) with categories "A", "B", "C".
from bofire.data_models.features.api import CategoricalInput
CategoricalInput(key="x4", categories=["A", "B", "C"])
Molecular inputs
Molecular inputs are used to define molecular structures. This requires the optional cheminfo dependencies installed. The MolecularInput
class operates on SMILES strings. For example, the following code defines a new molecular input variable \(x_5\).
from bofire.data_models.features.api import MolecularInput
MolecularInput(key="x5")
Categorical descriptor inputs
Via the CategoricalDescriptorInput
one can provide continuous encodings for the different categories via so called descriptors. Imagine, for example, having a categorial input with different categories, where each category corresponds to a specific material. Such descriptors could be e.g., density and hardness. Every material/category would get assigned a number for density and hardness in the hope that these two properties describe the material properly. In the context of fitting a GP, one can then use just these two dimensional vector for describing the material instead of a ten dimensional one-hot encoding, which results in a dimensionality reduction. Of course, this makes sense under the assumption that the descriptors actually correlate with the desired quantities.
The CategoricalDescriptorInput
class requires a list of permissible values as input. For example, the following code defines a new categorical descriptor input variable \(x_6\) with values "A", "B", "C".
from bofire.data_models.features.api import CategoricalDescriptorInput
CategoricalDescriptorInput(
key="x6",
categories=["material_A", "material_B", "material_C"],
descriptors=["density", "hardness"],
values=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
)
Categorical molecular inputs
Categorical molecular inputs inherit from both molecular and categorical inputs. In addition to the 'MolecularInput' the user can provide a list of allowed SMILES strings via the categories
attribute. The following code defines a new categorical molecular input variable \(x_7\) where only the values "C1CCCCC1", "O1CCOCC1" are allowed.
from bofire.data_models.features.api import CategoricalMolecularInput
CategoricalMolecularInput(key="x7", categories=["C1CCCCC1", "O1CCOCC1"])
Inputs class
The Inputs
class is used to collect multiple input variables. It is used to define the inputs of a domain. The following code defines a new input class with the above described input variables \(x_1, x_2, x_3, x_4, x_5, x_6, x_7\).
from bofire.data_models.api import Inputs
inputs = Inputs(
features=[ContinuousInput(key="x1", bounds=[0,1]),
ContinuousInput(key="x2", bounds=[0,1]),
DiscreteInput(key="x3", values=[0, 0.1, 0.2]),
CategoricalInput(key="x4", categories=["A", "B", "C"]),
MolecularInput(key="x5"),
CategoricalDescriptorInput(key="x6", categories=["material_A", "material_B", "material_C"], descriptors=["density", "hardness"], values=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]),
CategoricalMolecularInput(key="x7", categories=["C1CCCCC1", "O1CCOCC1"])])
Outputs
At the moment only continuous and categorical outputs are supported. Those are similar to the continuous and categorical inputs, but they additionaly contain the objective
attribute. The objective
attribute is used to define the optimization objective for the output variable. Similar to the inputs the outputs can be also collected in an Outputs
object. An example with a continuous and a categorical output is given below.
from bofire.data_models.api import Outputs
from bofire.data_models.objectives.api import ConstrainedCategoricalObjective
from bofire.data_models.features.api import ContinuousOutput, CategoricalOutput
outputs = Outputs(
features=[
ContinuousOutput(key="y1", objective=MinimizeObjective()),
CategoricalOutput(
key="y2",
categories=["a", "b", "c"], # Add the required categories
objective=ConstrainedCategoricalObjective(
categories=["a", "b", "c"],
desirability=[True, False, True]
)
)
]
)
Objectives
Different classes for the objectives are implemented in BoFire. These are used to set the objective
attribute of an output object. Note that each output variable can have its own objective. The following objectives are available:
MaximizeObjective
: This is the class of the default value. The objective is to maximize the output variable(s).MinimizeObjective
: The objective is to minimize the output variable(s). Note that minimization objectives can be transformed into maximization objectives and vice versa just by multiplying the corresponding output by -1.MaximizeSigmoidObjective
: The objective is to maximize the output variable(s) using a sigmoid transformation. This is useful to implement bounds on the output variable(s).MinimizeSigmoidObjective
: Similar toMaximizeSigmoidObjective
, but the objective is to minimize the output variable(s).TargetObjective
: The objective is to reach a target value for the output variable(s).TargetObjective
is of typeConstrainedObjective
asMaximizeSigmoidObjective
orMinimizeSigmoidObjective
, i.e., it becomes one if the value is in the target region and falls asymptorically to zero outside that region. That means that once the target objective is met, e.g., objective value above target value in aMaximizeSigmoidObjective
case, the optimization does not care about how close the objective value is compared to the target value.CloseToTargetObjective
: This objective measures the difference to a target value. Such an objective is often meaningful to minimize in a multiobjective optimization, and thus be included in the pareto front. Note that also the objectives of typeConstrainedObjective
can be used in multiobjective optimization, but for that at least two targets of typeMaximizeSigmoidObjective
,MinimizeSigmoidObjective
orTargetObjective
are needed. The main difference with respect to theTargetObjective
is that here the goal is to stay as close as possible to the target value.ConstrainedCategoricalObjective
: Categorical objective where for each output variable a probability vector for the categories is given.
If many outputs are defined, then different weights can be attributed as optional argument w
. More detailed descriptions for the individual objecrives can be found in the API documentation.
Constraints
The optimization problem in BoFire can be solved subject to a variety of different constraint types in the input space. These constraints are currently categorized as linear, non-linear, n choose k and interpoint constraints.
Linear constraints
Linear constraints are used to define linear equalities and inequalities.
Equalities can be defined using the LinearEqualityConstraint
class. The features
attribute is used to define the input variables, the coefficients
attribute is used to define the coefficients of the linear equation and the rhs
attribute is used to define the right-hand side of the equation.
Inequalities can be defined using the LinearInequalityConstraint
class. The attributes are defined in the same way as for the LinearEqualityConstraint
class.
The following code defines a new linear equality constraint \(x_1 + x_2 = 1\) and an inequality constraint \(x_1 - x_3 \leq 1\).
from bofire.data_models.constraints.api import LinearEqualityConstraint, LinearInequalityConstraint
constraints = [LinearEqualityConstraint(features=["x1", "x2"], coefficients=[1,1], rhs=1),
LinearInequalityConstraint(features=["x1", "x3"], coefficients=[1,-1], rhs=1)]
Note that the variables in the features
attribute used in the constraints must be previously defined as inputs in the domain, in order to have a well defined optimization problem.
Non-linear constraints
Non-linear constraints can be used to define equalities and inequalities of the form:
where \(c(\mathbf{x})\) can be an any differentiable function of the inputs x represented by a string attribute expression
that can be evaluated via the eval()
method of pandas dataframe. If sympy
is installed, the derivate expressions are automatically calculated from the expression
attribute. Otherwise, the user can provide additional expressions for the derivatives using the jacobian_expressions
attribute. The features
attrribute should contain the names of the input variables used in the expression
attribute.
The following code defines a new non-linear inequality constraint \(x1^2 + x2^2 - x3 \leq 0\).
from bofire.data_models.constraints.api import NonlinearInequalityConstraint
NonlinearInequalityConstraint(expression="x1**2 + x2**2 - x3", features=["x1","x2","x3"])
N choose k constraints
Given a list of N features and attributes min_count
, max_count
, the NChooseKConstraint
class is used to define constraints that require a minimum and maximum number of features to be nonzero. Note that one can put min_count==max_count. The following code defines a new NChooseKConstraint
that requires at least 2 and at most 3 features to be selected from the list ["x1", "x2", "x3", "x4"].
from bofire.data_models.constraints.api import NChooseKConstraint
NChooseKConstraint(features=["x1", "x2", "x3", "x4"], min_count=2, max_count=3, none_also_valid=False)
Interpoint constraints
The InterpointEqualityConstraint
forces that values of a certain feature of a set/batch of candidates should have the same value. The set is defined using the multiplicity
attribute. The following code defines a new InterpointEqualityConstraint
that forces the values of the feature "x1" to be the same for every 3 subsequent candidates.
from bofire.data_models.constraints.api import InterpointEqualityConstraint
InterpointEqualityConstraint(features=["x1"], multiplicity=3)