ribs.archives.GridArchive¶
- class ribs.archives.GridArchive(*, solution_dim, dims, ranges, learning_rate=None, threshold_min=-inf, epsilon=1e-06, qd_score_offset=0.0, seed=None, dtype=<class 'numpy.float64'>, extra_fields=None)[source]¶
An archive that divides each dimension into uniformly-sized cells.
This archive is the container described in Mouret 2015. It can be visualized as an n-dimensional grid in the measure space that is divided into a certain number of cells in each dimension. Each cell contains an elite, i.e., a solution that maximizes the objective function and has measures that lie within that cell.
This archive also implements the idea of soft archives that have thresholds, as introduced in Fontaine 2023. To learn more about thresholds, including the
learning_rate
andthreshold_min
parameters, please refer to the tutorial Upgrading CMA-ME to CMA-MAE on the Sphere Benchmark.By default, this archive stores the following data fields:
solution
,objective
,measures
,threshold
, andindex
. Thethreshold
is the value that a solution’s objective value must exceed to be inserted into a cell, while the integerindex
uniquely identifies each cell.- Parameters
solution_dim (int or tuple of int) – Dimensionality of the solution space. Scalar or multi-dimensional solution shapes are allowed by passing an empty tuple or tuple of integers, respectively.
dims (array-like of int) – Number of cells in each dimension of the measure space, e.g.
[20, 30, 40]
indicates there should be 3 dimensions with 20, 30, and 40 cells. (The number of dimensions is implicitly defined in the length of this argument).ranges (array-like of (float, float)) – Upper and lower bound of each dimension of the measure space, e.g.
[(-1, 1), (-2, 2)]
indicates the first dimension should have bounds \([-1,1]\) (inclusive), and the second dimension should have bounds \([-2,2]\) (inclusive).ranges
should be the same length asdims
.epsilon (float) – Due to floating point precision errors, a small epsilon is added when computing the archive indices in the
index_of()
method – refer to the implementation here. Pass this parameter to configure that epsilon.learning_rate (float) – The learning rate for threshold updates. Defaults to 1.0.
threshold_min (float) – The initial threshold value for all the cells.
qd_score_offset (float) – Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as
objective - (-300)
.seed (int) – Value to seed the random number generator. Set to None to avoid a fixed seed.
dtype (str or data-type or dict) – Data type of the solutions, objectives, and measures. This can be
"f"
/np.float32
,"d"
/np.float64
, or a dict specifying separate dtypes, of the form{"solution": <dtype>, "objective": <dtype>, "measures": <dtype>}
.extra_fields (dict) – Description of extra fields of data that is stored next to elite data like solutions and objectives. The description is a dict mapping from a field name (str) to a tuple of
(shape, dtype)
. For instance,{"foo": ((), np.float32), "bar": ((10,), np.float32)}
will create a “foo” field that contains scalar values and a “bar” field that contains 10D values. Note that field names must be valid Python identifiers, and names already used in the archive are not allowed.
- Raises
ValueError – Invalid values for learning_rate and threshold_min.
ValueError – Invalid names in extra_fields.
ValueError –
dims
andranges
are not the same length.
Methods
__iter__
()Creates an iterator over the elites in the archive.
__len__
()Number of elites in the archive.
add
(solution, objective, measures, **fields)Inserts a batch of solutions into the archive.
add_single
(solution, objective, measures, ...)Inserts a single solution into the archive.
clear
()Removes all elites in the archive.
data
([fields, return_type])Returns data of the elites in the archive.
grid_to_int_index
(grid_indices)Converts a batch of grid indices into a batch of integer indices.
index_of
(measures)Returns archive indices for the given batch of measures.
index_of_single
(measures)Returns the index of the measures for one solution.
int_to_grid_index
(int_indices)Converts a batch of indices into indices in the archive's grid.
retessellate
(new_dims)Updates the resolution of this archive to the given dimensions.
retrieve
(measures)Queries the archive for elites with the given batch of measures.
retrieve_single
(measures)Queries the archive for an elite with the given measures.
Randomly samples elites from the archive.
Attributes
The elite with the highest objective in the archive.
The boundaries of the cells in each dimension.
Total number of cells in the archive.
Number of cells in each dimension.
Mapping from field name to dtype for all fields in the archive.
Whether the archive is empty.
Epsilon for computing archive indices.
List of data fields in the archive.
The size of each dim (upper_bounds - lower_bounds).
The learning rate for threshold updates.
Lower bound of each dimension.
Dimensionality of the measure space.
Dimensionality of the objective space.
The offset which is subtracted from objective values when computing the QD score.
Dimensionality of the solution space.
Statistics about the archive.
The initial threshold value for all the cells.
Upper bound of each dimension.
- add(solution, objective, measures, **fields)[source]¶
Inserts a batch of solutions into the archive.
Each solution is only inserted if it has a higher
objective
than the threshold of the corresponding cell. For the default values oflearning_rate
andthreshold_min
, this threshold is simply the objective value of the elite previously in the cell. If multiple solutions in the batch end up in the same cell, we only insert the solution with the highest objective. If multiple solutions end up in the same cell and tie for the highest objective, we insert the solution that appears first in the batch.For the default values of
learning_rate
andthreshold_min
, the threshold for each cell is updated by taking the maximum objective value among all the solutions that landed in the cell, resulting in the same behavior as in the vanilla MAP-Elites archive. However, for other settings, the threshold is updated with the batch update rule described in the appendix of Fontaine 2023.Note
The indices of all arguments should “correspond” to each other, i.e.,
solution[i]
,objective[i]
, andmeasures[i]
should be the solution parameters, objective, and measures for solutioni
.- Parameters
solution (array-like) – (batch_size,
solution_dim
) array of solution parameters.objective (array-like) – (batch_size,) array with objective function evaluations of the solutions.
measures (array-like) – (batch_size,
measure_dim
) array with measure space coordinates of all the solutions.fields (keyword arguments) – Additional data for each solution. Each argument should be an array with batch_size as the first dimension.
- Returns
Information describing the result of the add operation. The dict contains the following keys:
"status"
(numpy.ndarray
ofnumpy.int32
): An array of integers that represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:0
: The solution was not added to the archive.1
: The solution improved the objective value of a cell which was already in the archive.2
: The solution discovered a new cell in the archive.
All statuses (and values, below) are computed with respect to the current archive. For example, if two solutions both introduce the same new archive cell, then both will be marked with
2
.The alternative is to depend on the order of the solutions in the batch – for example, if we have two solutions
a
andb
that introduce the same new cell in the archive,a
could be inserted first with status2
, andb
could be inserted second with status1
because it improves upona
. However, our implementation does not do this.To convert statuses to a more semantic format, cast all statuses to
AddStatus
, e.g., with[AddStatus(s) for s in add_info["status"]]
."value"
(numpy.ndarray
ofdtypes
[“objective”]): An array with values for each solution in the batch. With the default values oflearning_rate = 1.0
andthreshold_min = -np.inf
, the meaning of each value depends on the correspondingstatus
and is identical to that in CMA-ME (Fontaine 2020):0
(not added): The value is the “negative improvement,” i.e., the objective of the solution passed in minus the objective of the elite still in the archive (this value is negative because the solution did not have a high enough objective to be added to the archive).1
(improve existing cell): The value is the “improvement,” i.e., the objective of the solution passed in minus the objective of the elite previously in the archive.2
(new cell): The value is just the objective of the solution.
In contrast, for other values of
learning_rate
andthreshold_min
, each value is equivalent to the objective value of the solution minus the threshold of its corresponding cell in the archive.
- Return type
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective
ormeasures
has non-finite values (inf or NaN).
- add_single(solution, objective, measures, **fields)[source]¶
Inserts a single solution into the archive.
The solution is only inserted if it has a higher
objective
than the threshold of the corresponding cell. For the default values oflearning_rate
andthreshold_min
, this threshold is simply the objective value of the elite previously in the cell. The threshold is also updated if the solution was inserted.Note
This method is provided as an easier-to-understand implementation that has less performance due to inserting only one solution at a time. For better performance, see
add()
.- Parameters
solution (array-like) – Parameters of the solution.
objective (float) – Objective function evaluation of the solution.
measures (array-like) – Coordinates in measure space of the solution.
fields (keyword arguments) – Additional data for the solution.
- Returns
Information describing the result of the add operation. The dict contains
status
andvalue
keys; refer toadd()
for the meaning of status and value.- Return type
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective
is non-finite (inf or NaN) ormeasures
has non-finite values.
- data(fields=None, return_type='dict')[source]¶
Returns data of the elites in the archive.
- Parameters
- Returns
The data for all elites in the archive. All data returned by this method will be a copy, i.e., the data will not update as the archive changes. If
fields
was a single str, the returned data will just be an array holding data for the given field, such as:measures = archive.data("measures")
Otherwise, the returned data can take the following forms, depending on the
return_type
argument:return_type="dict"
: Dict mapping from the field name to the field data at the given indices. An example is:{ "solution": [[1.0, 1.0, ...], ...], "objective": [1.5, ...], "measures": [[1.0, 2.0], ...], ... }
The keys in this dict can be modified with the
fields
arg; duplicate fields will be ignored since the dict stores unique keys.return_type="tuple"
: Tuple of arrays matching the field order infields
. For instance, iffields
is["objective", "measures"]
, this method would return a tuple of(objective_arr, measures_arr)
that could be unpacked as:objective, measures = archive.data(["objective", "measures"], return_type="tuple")
Unlike with the
dict
return type, duplicate fields will show up as duplicate entries in the tuple, e.g.,fields=["objective", "objective"]
will result in two objective arrays being returned.When
fields=None
(the default case), the fields in the tuple will be ordered according to thefield_list
.return_type="pandas"
: AnArchiveDataFrame
with the following columns:For fields that are scalars, a single column with the field name. For example,
objective
would have a single column calledobjective
.For fields that are 1D arrays, multiple columns with the name suffixed by its index. To illustrate, for a
measures
field of length 10, the dataframe would contain 10 columns with namesmeasures_0
,measures_1
, …,measures_9
. The output format for fields with >1D data is currently not defined.
In short, the dataframe might look like this by default:
solution_0
…
objective
measures_0
…
…
…
Like the other return types, the columns returned can be adjusted with the
fields
parameter.
- grid_to_int_index(grid_indices)[source]¶
Converts a batch of grid indices into a batch of integer indices.
Refer to
index_of()
for more info.- Parameters
grid_indices (array-like) – (batch_size,
measure_dim
) array of indices in the archive grid.- Returns
(batch_size,) array of integer indices.
- Return type
- Raises
ValueError –
grid_indices
is not of shape (batch_size,measure_dim
)
- index_of(measures)[source]¶
Returns archive indices for the given batch of measures.
First, values are clipped to the bounds of the measure space. Then, the values are mapped to indices in the grid, e.g., cell 5 along dimension 0 and cell 3 along dimension 1. We convert these grid indices to integer indices with
numpy.ravel_multi_index()
and return the result.To convert between integer and grid indices, we also provide utility methods
grid_to_int_index()
andint_to_grid_index()
.As an example, grid indices can be used to access
boundaries
of a measure value’s cell. For example, the following retrieves the lower and upper bounds of the cell along dimension 0:# Access only element 0 since this method operates in batch. idx = archive.int_to_grid_index(archive.index_of(...))[0] lower = archive.boundaries[0][idx[0]] upper = archive.boundaries[0][idx[0] + 1]
- Parameters
measures (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.- Returns
(batch_size,) array of integer indices representing the flattened grid coordinates.
- Return type
- Raises
ValueError –
measures
is not of shape (batch_size,measure_dim
).ValueError –
measures
has non-finite values (inf or NaN).
- index_of_single(measures)[source]¶
Returns the index of the measures for one solution.
See
index_of()
.- Parameters
measures (array-like) – (
measure_dim
,) array of measures for a single solution.- Returns
Integer index of the measures in the archive’s storage arrays.
- Return type
int or numpy.integer
- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- int_to_grid_index(int_indices)[source]¶
Converts a batch of indices into indices in the archive’s grid.
Refer to
index_of()
for more info.- Parameters
int_indices (array-like) – (batch_size,) array of integer indices such as those output by
index_of()
.- Returns
(batch_size,
measure_dim
) array of indices in the archive grid.- Return type
- Raises
ValueError –
int_indices
is not of shape (batch_size,).
- retessellate(new_dims)[source]¶
Updates the resolution of this archive to the given dimensions.
Upon resizing the archive, this method re-inserts the solutions that are currently contained in the archive. Note that if the new grid resolution is smaller than the old grid resolution, some solutions may be dropped, as solutions originally from different cells may now land in the same cell, and only the highest-objective elite in each cell is retained.
Also note that the current implementation does not support archive thresholds from CMA-MAE, i.e., the learning rate must be 1. The thresholds within each cell should correspond to how well the measure space within that cell has been explored, and thereby should correspond to the measure space volume within that cell. It is an open research problem as to how the new thresholds should be determined after retessellating.
- Parameters
new_dims (array-like of int) – Number of cells in each dimension of the measure space, e.g.,
[20, 30, 40]
indicates there should be 3 dimensions with 20, 30, and 40 cells. The format is identical to thedims
argument in__init__
.- Raises
ValueError – Attempted to retessellate an archive with learning rate not equal to 1.
ValueError – The measure space dimensionality in
new_dims
does not match the current measure space dimensionality.
- retrieve(measures)[source]¶
Queries the archive for elites with the given batch of measures.
This method operates in batch. It takes in a batch of measures and outputs the batched data for the elites:
occupied, elites = archive.retrieve(...) occupied # Shape: (batch_size,) elites["solution"] # Shape: (batch_size, solution_dim) elites["objective"] # Shape: (batch_size, objective_dim) elites["measures"] # Shape: (batch_size, measure_dim) ...
occupied
indicates whether an elite was found for each measure, i.e., whether the archive was occupied at each queried measure. Ifoccupied[i]
is True, thenelites["solution"][i]
,elites["objective"][i]
,elites["measures"][i]
, and other fields will contain the data of the elite for the inputmeasures[i]
. Ifoccupied[i]
is False, then those fields will instead have arbitrary values, e.g.,elites["solution"][i]
may be set to all NaN.- Parameters
measures (array-like) – (batch_size,
measure_dim
) array of measure space points at which to retrieve solutions.- Returns
2-element tuple of (boolean
occupied
array, dict of elite data). See above for description.- Return type
- Raises
ValueError –
measures
is not of shape (batch_size,measure_dim
).ValueError –
measures
has non-finite values (inf or NaN).
- retrieve_single(measures)[source]¶
Queries the archive for an elite with the given measures.
While
retrieve()
takes in a batch of measures, this method takes in the measures for only one solution and returns a single bool and a dict with single entries:occupied, elite = archive.retrieve_single(...) occupied # Bool elite["solution"] # Shape: (solution_dim,) elite["objective"] # Shape: (objective_dim,) elite["measures"] # Shape: (measure_dim,) ...
- Parameters
measures (array-like) – (
measure_dim
,) array of measures.- Returns
2-element tuple of (boolean, dict of data for one elite)
- Return type
- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- sample_elites(n)[source]¶
Randomly samples elites from the archive.
Currently, this sampling is done uniformly at random. Furthermore, each sample is done independently, so elites may be repeated in the sample. Additional sampling methods may be supported in the future.
Example
elites = archive.sample_elites(16) elites["solution"] # Shape: (16, solution_dim) elites["objective"] elites["measures"] ...
- Parameters
n (int) – Number of elites to sample.
- Returns
Holds a batch of elites randomly selected from the archive.
- Return type
- Raises
IndexError – The archive is empty.
- property best_elite¶
The elite with the highest objective in the archive.
None if there are no elites in the archive.
Note
If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.
Note
The best elite will contain a “threshold” key. This threshold is the threshold of the best elite’s cell after the best elite was inserted into the archive.
- Type
- property boundaries¶
The boundaries of the cells in each dimension.
Entry
i
in this list is an array that contains the boundaries of the cells in dimensioni
. The array containsself.dims[i] + 1
entries laid out like this:Archive cells: | 0 | 1 | ... | self.dims[i] | boundaries[i]: 0 1 2 self.dims[i] - 1 self.dims[i]
Thus,
boundaries[i][j]
andboundaries[i][j + 1]
are the lower and upper bounds of cellj
in dimensioni
. To access the lower bounds of all the cells in dimensioni
, useboundaries[i][:-1]
, and to access all the upper bounds, useboundaries[i][1:]
.- Type
list of numpy.ndarray
- property dims¶
Number of cells in each dimension.
- Type
(measure_dim,) numpy.ndarray
- property epsilon¶
Epsilon for computing archive indices. Refer to the documentation for this class.
- Type
dtypes[“measures”]
- property interval_size¶
The size of each dim (upper_bounds - lower_bounds).
- Type
(measure_dim,) numpy.ndarray
- property lower_bounds¶
Lower bound of each dimension.
- Type
(measure_dim,) numpy.ndarray
- property objective_dim¶
Dimensionality of the objective space.
The empty tuple
()
indicates a scalar objective.- Type
int or empty tuple
- property qd_score_offset¶
The offset which is subtracted from objective values when computing the QD score.
- Type
- property stats¶
Statistics about the archive.
See
ArchiveStats
for more info.- Type
- property upper_bounds¶
Upper bound of each dimension.
- Type
(measure_dim,) numpy.ndarray