ribs.emitters.BayesianOptimizationEmitter¶
-
class ribs.emitters.BayesianOptimizationEmitter(archive: GridArchive, *, bounds: Collection[tuple[None | Float, None | Float]] | None =
None, lower_bounds: ArrayLike | None =None, upper_bounds: ArrayLike | None =None, search_nrestarts: Int =5, entropy_ejie: bool =False, upscale_schedule: ArrayLike | None =None, min_obj: Float =0, num_initial_samples: Int | None =None, initial_solutions: ArrayLike | None =None, batch_size: Int =1, seed: Int | None =None)[source]¶ A sample-efficient emitter that models objective and measure functions with Gaussian process surrogate models.
Bayesian Optimisation is used to emit solutions that are predicted to have high Expected Joint Improvement of Elites (EJIE) acquisition values. Refer to Kent 2024 for more information.
Note
This emitter requires the pymoo package, which can be installed with
pip install pymooorconda install pymoo.- Parameters:¶
- archive: GridArchive¶
An archive to use when creating and inserting solutions. Currently, the only supported archive type is
ribs.archives.GridArchive.- bounds: Collection[tuple[None | Float, None | Float]] | None =
None¶ Bounds of the solution space. This is a sequence of tuples, each of the form
(lower_bound, upper_bound). Unlike other emitters, either these bounds or thelower_bounds/upper_boundsbelow must be provided since SOBOL sampling is used.- lower_bounds: ArrayLike | None =
None¶ Instead of specifying
bounds,lower_boundsandupper_boundsmay be specified. This is useful if, for instance, solutions are multi-dimensional. Here, pass an array specifying the lower bounds of the solution space.- upper_bounds: ArrayLike | None =
None¶ Upper bounds of the solution space; see
lower_boundsabove.- search_nrestarts: Int =
5¶ Number of starting points for EJIE pattern search.
- entropy_ejie: bool =
False¶ If
True, augments EJIE acquisition function with entropy to encourage measure space exploration. Refer to Sec. 4.1 of Kent 2023 for more details.- upscale_schedule: ArrayLike | None =
None¶ An array of increasing archive resolutions starting with
archive.resolutionand ending with the user’s intended final archive resolution. This will upscale the archive to the next scheduled resolution if every cell within the current archive has been filled, or the number of evaluated solutions is more than twicearchive.cells. IfNone, the archive will not be upscaled.- min_obj: Float =
0¶ The lowest possible objective value. Serves as the default objective value within archive cells that have not been filled. Mainly used when computing expected improvement.
- num_initial_samples: Int | None =
None¶ The number of solutions that will be sampled from a Sobol sequence as the first batch of training data for gaussian processes. Either
num_initial_samplesorinitial_solutionsmust be set.- initial_solutions: ArrayLike | None =
None¶ An (n, solution_dim) array of solutions to be used as the first batch of training data for gaussian processes. Either
num_initial_samplesorinitial_solutionsmust be set.- batch_size: Int =
1¶ Number of solutions to return in
ask(). Must not exceedsearch_nrestarts. It is recommended to set this to 1 for sample efficiency.- seed: Int | None =
None¶ Seed for the random number generator.
Methods
ask()Returns solutions that are predicted to have high EJIE values.
ask_dqd()Generates solutions for which gradient information must be computed.
Runs after the scheduler upscales the archive.
tell(solution, objective, measures, ...)Updates the gaussian process and potentially upscales the archive.
tell_dqd(solution, objective, measures, ...)Gives the emitter results from evaluating the gradient of the solutions.
Attributes
Stores solutions generated by this emitter.
Number of solutions to return in
ask().Cutoff value (ohm) for
_get_cell_probs().Data type of solutions.
Returned when the archive is empty (if
x0is not set).(solution_dim,)array with lower bounds of solution space.Number of measure functions.
The lowest possible objective value.
Number of solutions stored in
_dataset.Number of SOBOL samples when choosing pattern search starting points in
ask().Dimensionality of solutions produced by this emitter.
(solution_dim,)array with upper bounds of solution space.Archive upscale schedule.
Maximum number of iterations the emitter is allowed to not find new cells before archive upscale is triggered.
- ask() ndarray[source]¶
Returns solutions that are predicted to have high EJIE values.
If
self._gphas not been trained on any data andself._initial_solutionsis set, we returnself._initial_solutions, which was either provided by user at emitter initialization or sampled from a Sobol sequence.If
self._gphas been trained on some data:Samples
num_sobol_samplesSOBOL samples.Computes the EJIE values for each sample, and keeps the top
_search_nrestartssamples with the largest EJIE values and as starting points for pattern search.Starts a pattern search instance for each starting point to maximize their EJIE values.
After all pattern search instances have converged, checks if at least
batch_sizesamples with positive EJIE values have been found. If not, increments_overspecand repeats the process until at leastbatch_sizesolutions with positive EJIE values have been found.Returns the top
batch_sizesolutions with the largest EJIE values.
NOTE: This process has been simplified from the original implementation. The following are the components that are in the BOP-Elites source codes but removed here for simplicity:
We no longer restrict all starting points to be from unique cells. We understand this might compromise performance a bit, but enforcing all starting points from unique cells becomes messy in extreme cases when, for example, our archive resolution is so low that the number of cells is smaller than the number of starting points. Additionally, to my current understanding, it is not guaranteed that starting points from unique cells will result in higher optimized EJIE, because some cells might be easier to improve than others.
We no longer explicitly add samples predicted to be in empty cells to the starting point pool, since samples predicted to be in empty cells should already have high EJIE.
- Returns:¶
Array of shape (
batch_size,solution_dim) containing the solutions with the largest EJIE values in descending EJIE order.- Return type:¶
- ask_dqd() ndarray¶
Generates solutions for which gradient information must be computed.
The solutions should be a
(batch_size, solution_dim)array.This method only needs to be implemented by emitters used in DQD. It returns an empty array by default.
- post_upscale_updates() None[source]¶
Runs after the scheduler upscales the archive.
This method updates
_entropy_normaccording to new number of archive cells and resets_numitrs_noprogressto 0.
- tell(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, add_info: dict[str, ndarray], **fields: numpy.typing.ArrayLike) ndarray | None[source]¶
Updates the gaussian process and potentially upscales the archive.
The function does the following:
Adds
solution,objective, andmeasuresto_dataset.Updates
_gpwith_dataset.For each solution whose EJIE attribution exceeds 50%, checks whether its predicted cell is different from the cell it is actually assigned according to its evaluated measures. If so, increments
_misspec.If
upscale_scheduleis notNone, and if the archive upscale conditions have been met, sends an upscale signal upstream by returning the next resolution to upscale to.
- Parameters:¶
- solution: numpy.typing.ArrayLike¶
(batch_size,
solution_dim) array of solutions generated by this emitter’sask()method.- objective: numpy.typing.ArrayLike¶
1D array containing the objective function value of each solution.
- measures: numpy.typing.ArrayLike¶
(batch_size,
measure_dim) array with the measure values of each solution.- add_info: dict[str, ndarray]¶
Data returned from the archive
add()method.- **fields: numpy.typing.ArrayLike¶
Additional data for each solution. Each argument should be an array with batch_size as the first dimension.
- Returns:¶
A 1D array of shape (
measure_dim,) containing the next resolution to upscale to. The actual upscaling will be done in the scheduler, throughtell(). If no upscaling is needed in the current step, returnsNone.
- tell_dqd(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, jacobian: numpy.typing.ArrayLike, add_info: dict[str, ndarray], **fields: numpy.typing.ArrayLike) None¶
Gives the emitter results from evaluating the gradient of the solutions.
This method is the counterpart of
ask_dqd(). It is only used by DQD emitters.- Parameters:¶
- solution: numpy.typing.ArrayLike¶
(batch_size, :attr:`solution_dim`)array of solutions generated by this emitter’sask()method.- objective: numpy.typing.ArrayLike¶
1-dimensional array containing the objective function value of each solution.
- measures: numpy.typing.ArrayLike¶
(batch_size, measure space dimension)array with the measure space coordinates of each solution.- jacobian: numpy.typing.ArrayLike¶
(batch_size, 1 + measure_dim, solution_dim)array consisting of Jacobian matrices of the solutions obtained fromask_dqd(). Each matrix should consist of the objective gradient of the solution followed by the measure gradients.- add_info: dict[str, ndarray]¶
Data returned from the archive
add()method.- **fields: numpy.typing.ArrayLike¶
Additional data for each solution. Each argument should be an array with batch_size as the first dimension.
- property archive : ArchiveBase¶
Stores solutions generated by this emitter.
- property cell_prob_cutoff : float | floating¶
Cutoff value (ohm) for
_get_cell_probs().Described in Kent 2024 Sec.IV-D. There are some numerical errors involved with cell_probs, so even passing the same sample in different shapes/contexts can sometimes return slightly different cell_probs, so we return cell_prob_cutoff at a lower precision than cell_probs to ensure the same sample consistently passes/fails the threshold check.
- property lower_bounds : ndarray¶
(solution_dim,)array with lower bounds of solution space.For instance,
[-1, -1, -1]indicates that every dimension of the solution space has a lower bound of -1.
- property min_obj : float | floating¶
The lowest possible objective value.
Refer to the documentation for this class.
- property num_evals : int¶
Number of solutions stored in
_dataset.This is the number of solutions that have been evaluated since the initialization of this emitter.
- property num_sobol_samples : int | integer¶
Number of SOBOL samples when choosing pattern search starting points in
ask().Note
If measure function gradients are available, a potentially better way to do this might be to do Latin Hypercube sampling within measure space, and then use measure gradients to find solutions achieving those measure space samples. See Kent 2024b Sec. 6.3 for more details.
- property solution_dim : int | integer | tuple[int | integer, ...]¶
Dimensionality of solutions produced by this emitter.
- property upper_bounds : ndarray¶
(solution_dim,)array with upper bounds of solution space.For instance,
[1, 1, 1]indicates that every dimension of the solution space has an upper bound of 1.