ribs.emitters.BayesianOptimizationEmitter¶

class ribs.emitters.BayesianOptimizationEmitter(archive: GridArchive, *, bounds: Collection[tuple[None | Float, None | Float]] | None = None, lower_bounds: ArrayLike | None = None, upper_bounds: ArrayLike | None = None, search_nrestarts: Int = 5, entropy_ejie: bool = False, upscale_schedule: ArrayLike | None = None, min_obj: Float = 0, num_initial_samples: Int | None = None, initial_solutions: ArrayLike | None = None, batch_size: Int = 1, seed: Int | None = None)[source]¶

A sample-efficient emitter that models objective and measure functions with Gaussian process surrogate models.

Bayesian Optimisation is used to emit solutions that are predicted to have high Expected Joint Improvement of Elites (EJIE) acquisition values. Refer to Kent 2024 for more information.

Note

This emitter requires the pymoo package, which can be installed with pip install pymoo or conda install pymoo.

Parameters:¶

archive: GridArchive¶: An archive to use when creating and inserting solutions. Currently, the only supported archive type is ribs.archives.GridArchive.
bounds: Collection[tuple[None | Float, None | Float]] | None = None¶: Bounds of the solution space. This is a sequence of tuples, each of the form (lower_bound, upper_bound). Unlike other emitters, either these bounds or the lower_bounds/upper_bounds below must be provided since SOBOL sampling is used.
lower_bounds: ArrayLike | None = None¶: Instead of specifying bounds, lower_bounds and upper_bounds may be specified. This is useful if, for instance, solutions are multi-dimensional. Here, pass an array specifying the lower bounds of the solution space.
upper_bounds: ArrayLike | None = None¶: Upper bounds of the solution space; see lower_bounds above.
search_nrestarts: Int = 5¶: Number of starting points for EJIE pattern search.
entropy_ejie: bool = False¶: If True, augments EJIE acquisition function with entropy to encourage measure space exploration. Refer to Sec. 4.1 of Kent 2023 for more details.
upscale_schedule: ArrayLike | None = None¶: An array of increasing archive resolutions starting with archive.resolution and ending with the user’s intended final archive resolution. This will upscale the archive to the next scheduled resolution if every cell within the current archive has been filled, or the number of evaluated solutions is more than twice archive.cells. If None, the archive will not be upscaled.
min_obj: Float = 0¶: The lowest possible objective value. Serves as the default objective value within archive cells that have not been filled. Mainly used when computing expected improvement.
num_initial_samples: Int | None = None¶: The number of solutions that will be sampled from a Sobol sequence as the first batch of training data for gaussian processes. Either num_initial_samples or initial_solutions must be set.
initial_solutions: ArrayLike | None = None¶: An (n, solution_dim) array of solutions to be used as the first batch of training data for gaussian processes. Either num_initial_samples or initial_solutions must be set.
batch_size: Int = 1¶: Number of solutions to return in ask(). Must not exceed search_nrestarts. It is recommended to set this to 1 for sample efficiency.
seed: Int | None = None¶: Seed for the random number generator.

Methods

`ask`()	Returns solutions that are predicted to have high EJIE values.
`ask_dqd`()	Generates solutions for which gradient information must be computed.
`post_upscale_updates`()	Runs after the scheduler upscales the archive.
`tell`(solution, objective, measures, ...)	Updates the gaussian process and potentially upscales the archive.
`tell_dqd`(solution, objective, measures, ...)	Gives the emitter results from evaluating the gradient of the solutions.

Attributes

`archive`	Stores solutions generated by this emitter.
`batch_size`	Number of solutions to return in `ask()`.
`cell_prob_cutoff`	Cutoff value (ohm) for `_get_cell_probs()`.
`dtype`	Data type of solutions.
`initial_solutions`	Returned when the archive is empty (if `x0` is not set).
`lower_bounds`	`(solution_dim,)` array with lower bounds of solution space.
`measure_dim`	Number of measure functions.
`min_obj`	The lowest possible objective value.
`num_evals`	Number of solutions stored in `_dataset`.
`num_sobol_samples`	Number of SOBOL samples when choosing pattern search starting points in `ask()`.
`solution_dim`	Dimensionality of solutions produced by this emitter.
`upper_bounds`	`(solution_dim,)` array with upper bounds of solution space.
`upscale_schedule`	Archive upscale schedule.
`upscale_trigger_threshold`	Maximum number of iterations the emitter is allowed to not find new cells before archive upscale is triggered.

ask() → ndarray[source]¶

Returns solutions that are predicted to have high EJIE values.

If self._gp has not been trained on any data and self._initial_solutions is set, we return self._initial_solutions, which was either provided by user at emitter initialization or sampled from a Sobol sequence.

If self._gp has been trained on some data:

Samples num_sobol_samples SOBOL samples.
Computes the EJIE values for each sample, and keeps the top _search_nrestarts samples with the largest EJIE values and as starting points for pattern search.
Starts a pattern search instance for each starting point to maximize their EJIE values.
After all pattern search instances have converged, checks if at least batch_size samples with positive EJIE values have been found. If not, increments _overspec and repeats the process until at least batch_size solutions with positive EJIE values have been found.
Returns the top batch_size solutions with the largest EJIE values.

NOTE: This process has been simplified from the original implementation. The following are the components that are in the BOP-Elites source codes but removed here for simplicity:

load_previous_points
gen_elite_children
We no longer restrict all starting points to be from unique cells. We understand this might compromise performance a bit, but enforcing all starting points from unique cells becomes messy in extreme cases when, for example, our archive resolution is so low that the number of cells is smaller than the number of starting points. Additionally, to my current understanding, it is not guaranteed that starting points from unique cells will result in higher optimized EJIE, because some cells might be easier to improve than others.
We no longer explicitly add samples predicted to be in empty cells to the starting point pool, since samples predicted to be in empty cells should already have high EJIE.

Returns:¶: Array of shape (batch_size, solution_dim) containing the solutions with the largest EJIE values in descending EJIE order.
Return type:¶: numpy.ndarray

ask_dqd() → ndarray¶

Generates solutions for which gradient information must be computed.

The solutions should be a (batch_size, solution_dim) array.

This method only needs to be implemented by emitters used in DQD. It returns an empty array by default.

post_upscale_updates() → None[source]¶

Runs after the scheduler upscales the archive.

This method updates _entropy_norm according to new number of archive cells and resets _numitrs_noprogress to 0.

tell(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, add_info: dict[str, ndarray], **fields: numpy.typing.ArrayLike) → ndarray | None[source]¶

Updates the gaussian process and potentially upscales the archive.

The function does the following:

Adds solution, objective, and measures to _dataset.
Updates _gp with _dataset.
For each solution whose EJIE attribution exceeds 50%, checks whether its predicted cell is different from the cell it is actually assigned according to its evaluated measures. If so, increments _misspec.
If upscale_schedule is not None, and if the archive upscale conditions have been met, sends an upscale signal upstream by returning the next resolution to upscale to.

Parameters:¶

solution: numpy.typing.ArrayLike¶: (batch_size, solution_dim) array of solutions generated by this emitter’s ask() method.
objective: numpy.typing.ArrayLike¶: 1D array containing the objective function value of each solution.
measures: numpy.typing.ArrayLike¶: (batch_size, measure_dim) array with the measure values of each solution.
add_info: dict[str, ndarray]¶: Data returned from the archive add() method.
**fields: numpy.typing.ArrayLike¶: Additional data for each solution. Each argument should be an array with batch_size as the first dimension.

Returns:¶

A 1D array of shape (measure_dim,) containing the next resolution to upscale to. The actual upscaling will be done in the scheduler, through tell(). If no upscaling is needed in the current step, returns None.

tell_dqd(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, jacobian: numpy.typing.ArrayLike, add_info: dict[str, ndarray], **fields: numpy.typing.ArrayLike) → None¶

Gives the emitter results from evaluating the gradient of the solutions.

This method is the counterpart of ask_dqd(). It is only used by DQD emitters.

Parameters:¶

solution: numpy.typing.ArrayLike¶: (batch_size, :attr:`solution_dim`) array of solutions generated by this emitter’s ask() method.
objective: numpy.typing.ArrayLike¶: 1-dimensional array containing the objective function value of each solution.
measures: numpy.typing.ArrayLike¶: (batch_size, measure space dimension) array with the measure space coordinates of each solution.
jacobian: numpy.typing.ArrayLike¶: (batch_size, 1 + measure_dim, solution_dim) array consisting of Jacobian matrices of the solutions obtained from ask_dqd(). Each matrix should consist of the objective gradient of the solution followed by the measure gradients.
add_info: dict[str, ndarray]¶: Data returned from the archive add() method.
**fields: numpy.typing.ArrayLike¶: Additional data for each solution. Each argument should be an array with batch_size as the first dimension.

property archive : ArchiveBase¶: Stores solutions generated by this emitter.

property batch_size : int | integer¶: Number of solutions to return in ask().

property cell_prob_cutoff : float | floating¶

Cutoff value (ohm) for _get_cell_probs().

Described in Kent 2024 Sec.IV-D. There are some numerical errors involved with cell_probs, so even passing the same sample in different shapes/contexts can sometimes return slightly different cell_probs, so we return cell_prob_cutoff at a lower precision than cell_probs to ensure the same sample consistently passes/fails the threshold check.

property dtype : dtype¶: Data type of solutions.

property initial_solutions : ndarray | None¶: Returned when the archive is empty (if x0 is not set).

property lower_bounds : ndarray¶

(solution_dim,) array with lower bounds of solution space.

For instance, [-1, -1, -1] indicates that every dimension of the solution space has a lower bound of -1.

property measure_dim : int¶: Number of measure functions.

property min_obj : float | floating¶

The lowest possible objective value.

Refer to the documentation for this class.

property num_evals : int¶

Number of solutions stored in _dataset.

This is the number of solutions that have been evaluated since the initialization of this emitter.

property num_sobol_samples : int | integer¶: Number of SOBOL samples when choosing pattern search starting points in ask().

Note

If measure function gradients are available, a potentially better way to do this might be to do Latin Hypercube sampling within measure space, and then use measure gradients to find solutions achieving those measure space samples. See Kent 2024b Sec. 6.3 for more details.

property solution_dim : int | integer | tuple[int | integer, ...]¶: Dimensionality of solutions produced by this emitter.

property upper_bounds : ndarray¶

(solution_dim,) array with upper bounds of solution space.

For instance, [1, 1, 1] indicates that every dimension of the solution space has an upper bound of 1.

property upscale_schedule : ndarray | None¶

Archive upscale schedule.

Defined when initializing this emitter.

property upscale_trigger_threshold : int | integer¶

Maximum number of iterations the emitter is allowed to not find new cells before archive upscale is triggered.

See here for more details.