ribs.archives.GridArchive

class ribs.archives.GridArchive(*, solution_dim, dims, ranges, learning_rate=None, threshold_min=-inf, epsilon=1e-06, qd_score_offset=0.0, seed=None, dtype=<class 'numpy.float64'>, extra_fields=None)[source]

An archive that divides each dimension into uniformly-sized cells.

This archive is the container described in Mouret 2015. It can be visualized as an n-dimensional grid in the measure space that is divided into a certain number of cells in each dimension. Each cell contains an elite, i.e. a solution that maximizes the objective function for the measures in that cell.

Note

The idea of archive thresholds was introduced in Fontaine 2022. For more info on thresholds, including the learning_rate and threshold_min parameters, refer to our tutorial Upgrading CMA-ME to CMA-MAE on the Sphere Benchmark.

Parameters
  • solution_dim (int) – Dimension of the solution space.

  • dims (array-like of int) – Number of cells in each dimension of the measure space, e.g. [20, 30, 40] indicates there should be 3 dimensions with 20, 30, and 40 cells. (The number of dimensions is implicitly defined in the length of this argument).

  • ranges (array-like of (float, float)) – Upper and lower bound of each dimension of the measure space, e.g. [(-1, 1), (-2, 2)] indicates the first dimension should have bounds \([-1,1]\) (inclusive), and the second dimension should have bounds \([-2,2]\) (inclusive). ranges should be the same length as dims.

  • epsilon (float) – Due to floating point precision errors, we add a small epsilon when computing the archive indices in the index_of() method – refer to the implementation here. Pass this parameter to configure that epsilon.

  • learning_rate (float) – The learning rate for threshold updates. Defaults to 1.0.

  • threshold_min (float) – The initial threshold value for all the cells.

  • qd_score_offset (float) – Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as objective - (-300).

  • seed (int) – Value to seed the random number generator. Set to None to avoid a fixed seed.

  • dtype (str or data-type or dict) – Data type of the solutions, objectives, and measures. We only support "f" / np.float32 and "d" / np.float64. Alternatively, this can be a dict specifying separate dtypes, of the form {"solution": <dtype>, "objective": <dtype>, "measures": <dtype>}.

  • extra_fields (dict) – Description of extra fields of data that is stored next to elite data like solutions and objectives. The description is a dict mapping from a field name (str) to a tuple of (shape, dtype). For instance, {"foo": ((), np.float32), "bar": ((10,), np.float32)} will create a “foo” field that contains scalar values and a “bar” field that contains 10D values. Note that field names must be valid Python identifiers, and names already used in the archive are not allowed.

Raises

ValueErrordims and ranges are not the same length.

Methods

__iter__()

Creates an iterator over the elites in the archive.

__len__()

Number of elites in the archive.

add(solution, objective, measures, **fields)

Inserts a batch of solutions into the archive.

add_single(solution, objective, measures, ...)

Inserts a single solution into the archive.

as_pandas([include_solutions, include_metadata])

DEPRECATED.

clear()

Removes all elites from the archive.

cqd_score(iterations, target_points, ...[, ...])

Computes the CQD score of the archive.

data([fields, return_type])

Retrieves data for all elites in the archive.

grid_to_int_index(grid_indices)

Converts a batch of grid indices into a batch of integer indices.

index_of(measures)

Returns archive indices for the given batch of measures.

index_of_single(measures)

Returns the index of the measures for one solution.

int_to_grid_index(int_indices)

Converts a batch of indices into indices in the archive's grid.

retrieve(measures)

Retrieves the elites with measures in the same cells as the measures specified.

retrieve_single(measures)

Retrieves the elite with measures in the same cell as the measures specified.

sample_elites(n)

Randomly samples elites from the archive.

Attributes

best_elite

The elite with the highest objective in the archive.

boundaries

The boundaries of the cells in each dimension.

cells

Total number of cells in the archive.

dims

Number of cells in each dimension.

dtype

DEPRECATED.

dtypes

Mapping from field name to dtype for all fields in the archive.

empty

Whether the archive is empty.

epsilon

Epsilon for computing archive indices.

field_list

List of data fields in the archive.

interval_size

The size of each dim (upper_bounds - lower_bounds).

learning_rate

The learning rate for threshold updates.

lower_bounds

Lower bound of each dimension.

measure_dim

Dimensionality of the measure space.

qd_score_offset

The offset which is subtracted from objective values when computing the QD score.

solution_dim

Dimensionality of the solutions in the archive.

stats

Statistics about the archive.

threshold_min

The initial threshold value for all the cells.

upper_bounds

Upper bound of each dimension.

__iter__()

Creates an iterator over the elites in the archive.

Example

for elite in archive:
    elite["solution"]
    elite["objective"]
    ...
__len__()

Number of elites in the archive.

add(solution, objective, measures, **fields)

Inserts a batch of solutions into the archive.

Each solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. If multiple solutions in the batch end up in the same cell, we only insert the solution with the highest objective. If multiple solutions end up in the same cell and tie for the highest objective, we insert the solution that appears first in the batch.

For the default values of learning_rate and threshold_min, the threshold for each cell is updated by taking the maximum objective value among all the solutions that landed in the cell, resulting in the same behavior as in the vanilla MAP-Elites archive. However, for other settings, the threshold is updated with the batch update rule described in the appendix of Fontaine 2022.

Note

The indices of all arguments should “correspond” to each other, i.e. solution[i], objective[i], measures[i], and should be the solution parameters, objective, and measures for solution i.

Parameters
  • solution (array-like) – (batch_size, solution_dim) array of solution parameters.

  • objective (array-like) – (batch_size,) array with objective function evaluations of the solutions.

  • measures (array-like) – (batch_size, measure_dim) array with measure space coordinates of all the solutions.

  • fields (keyword arguments) – Additional data for each solution. Each argument should be an array with batch_size as the first dimension.

Returns

Information describing the result of the add operation. The dict contains the following keys:

  • "status" (numpy.ndarray of int): An array of integers that represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:

    • 0: The solution was not added to the archive.

    • 1: The solution improved the objective value of a cell which was already in the archive.

    • 2: The solution discovered a new cell in the archive.

    All statuses (and values, below) are computed with respect to the current archive. For example, if two solutions both introduce the same new archive cell, then both will be marked with 2.

    The alternative is to depend on the order of the solutions in the batch – for example, if we have two solutions a and b which introduce the same new cell in the archive, a could be inserted first with status 2, and b could be inserted second with status 1 because it improves upon a. However, our implementation does not do this.

    To convert statuses to a more semantic format, cast all statuses to AddStatus e.g. with [AddStatus(s) for s in add_info["status"]].

  • "value" (numpy.ndarray of dtypes [“objective”]): An array with values for each solution in the batch. With the default values of learning_rate = 1.0 and threshold_min = -np.inf, the meaning of each value depends on the corresponding status and is identical to that in CMA-ME (Fontaine 2020):

    • 0 (not added): The value is the “negative improvement,” i.e. the objective of the solution passed in minus the objective of the elite still in the archive (this value is negative because the solution did not have a high enough objective to be added to the archive).

    • 1 (improve existing cell): The value is the “improvement,” i.e. the objective of the solution passed in minus the objective of the elite previously in the archive.

    • 2 (new cell): The value is just the objective of the solution.

    In contrast, for other values of learning_rate and threshold_min, each value is equivalent to the objective value of the solution minus the threshold of its corresponding cell in the archive.

Return type

dict

Raises
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective or measures has non-finite values (inf or NaN).

add_single(solution, objective, measures, **fields)

Inserts a single solution into the archive.

The solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. The threshold is also updated if the solution was inserted.

Note

To make it more amenable to modifications, this method’s implementation is designed to be readable at the cost of performance, e.g., none of its operations are modified. If you need performance, we recommend using add().

Parameters
  • solution (array-like) – Parameters of the solution.

  • objective (float) – Objective function evaluation of the solution.

  • measures (array-like) – Coordinates in measure space of the solution.

  • fields (keyword arguments) – Additional data for the solution.

Returns

Information describing the result of the add operation. The dict contains status and value keys; refer to add() for the meaning of status and value.

Return type

dict

Raises
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective is non-finite (inf or NaN) or measures has non-finite values.

as_pandas(include_solutions=True, include_metadata=False)

DEPRECATED.

clear()

Removes all elites from the archive.

After this method is called, the archive will be empty.

cqd_score(iterations, target_points, penalties, obj_min, obj_max, dist_max=None, dist_ord=None)

Computes the CQD score of the archive.

The Continuous Quality Diversity (CQD) score was introduced in Kent 2022.

Note

This method by default assumes that the archive has an upper_bounds and lower_bounds property which delineate the bounds of the measure space, as is the case in GridArchive, CVTArchive, and SlidingBoundariesArchive. If this is not the case, dist_max must be passed in, and target_points must be an array of custom points.

Parameters
  • iterations (int) – Number of times to compute the CQD score. We return the mean CQD score across these iterations.

  • target_points (int or array-like) – Number of target points to generate, or an (iterations, n, measure_dim) array which lists n target points to list on each iteration. When an int is passed, the points are sampled uniformly within the bounds of the measure space.

  • penalties (int or array-like) – Number of penalty values over which to compute the score (the values are distributed evenly over the range [0,1]). Alternatively, this may be a 1D array which explicitly lists the penalty values. Known as \(\theta\) in Kent 2022.

  • obj_min (float) – Minimum objective value, used when normalizing the objectives.

  • obj_max (float) – Maximum objective value, used when normalizing the objectives.

  • dist_max (float) – Maximum distance between points in measure space. Defaults to the distance between the extremes of the measure space bounds (the type of distance is computed with the order specified by dist_ord). Known as \(\delta_{max}\) in Kent 2022.

  • dist_ord – Order of the norm to use for calculating measure space distance; this is passed to numpy.linalg.norm() as the ord argument. See numpy.linalg.norm() for possible values. The default is to use Euclidean distance (L2 norm).

Returns

The mean CQD score obtained with iterations rounds of calculations.

Raises
  • RuntimeError – The archive does not have the bounds properties mentioned above, and dist_max is not specified or the target points are not provided.

  • ValueError – target_points or penalties is an array with the wrong shape.

data(fields=None, return_type='dict')

Retrieves data for all elites in the archive.

Parameters
  • fields (str or array-like of str) – List of fields to include. By default, all fields will be included, with an additional “index” as the last field (“index” can also be placed anywhere in this list). This can also be a single str indicating a field name.

  • return_type (str) – Type of data to return. See below. Ignored if fields is a str.

Returns

The data for all entries in the archive. If fields was a single str, this will just be an array holding data for the given field. Otherwise, this data can take the following forms, depending on the return_type argument:

  • return_type="dict": Dict mapping from the field name to the field data at the given indices. An example is:

    {
      "solution": [[1.0, 1.0, ...], ...],
      "objective": [1.5, ...],
      "measures": [[1.0, 2.0], ...],
      "threshold": [0.8, ...],
      "index": [4, ...],
    }
    

    Observe that we also return the indices as an index entry in the dict. The keys in this dict can be modified with the fields arg; duplicate fields will be ignored since the dict stores unique keys.

  • return_type="tuple": Tuple of arrays matching the field order given in fields. For instance, if fields was ["objective", "measures"], we would receive a tuple of (objective_arr, measures_arr). In this case, the results from retrieve could be unpacked as:

    objective, measures = archive.data(["objective", "measures"],
                                       return_type="tuple")
    

    Unlike with the dict return type, duplicate fields will show up as duplicate entries in the tuple, e.g., fields=["objective", "objective"] will result in two objective arrays being returned.

    By default, (i.e., when fields=None), the fields in the tuple will be ordered according to the field_list along with index as the last field.

  • return_type="pandas": A ArchiveDataFrame with the following columns:

    • For fields that are scalars, a single column with the field name. For example, objective would have a single column called objective.

    • For fields that are 1D arrays, multiple columns with the name suffixed by its index. For instance, if we have a measures field of length 10, we create 10 columns with names measures_0, measures_1, …, measures_9. We do not currently support fields with >1D data.

    • 1 column of integers (np.int32) for the index, named index.

    In short, the dataframe might look like this by default:

    solution_0

    objective

    measures_0

    threshold

    index

    Like the other return types, the columns can be adjusted with the fields parameter.

All data returned by this method will be a copy, i.e., the data will not update as the archive changes.

grid_to_int_index(grid_indices)[source]

Converts a batch of grid indices into a batch of integer indices.

Refer to index_of() for more info.

Parameters

grid_indices (array-like) – (batch_size, measure_dim) array of indices in the archive grid.

Returns

(batch_size,) array of integer indices.

Return type

numpy.ndarray

Raises

ValueErrorgrid_indices is not of shape (batch_size, measure_dim)

index_of(measures)[source]

Returns archive indices for the given batch of measures.

First, values are clipped to the bounds of the measure space. Then, the values are mapped to cells; e.g. cell 5 along dimension 0 and cell 3 along dimension 1.

At this point, we have “grid indices” – indices of each measure in each dimension. Since indices returned by this method must be single integers (as opposed to a tuple of grid indices), we convert these grid indices into integer indices with numpy.ravel_multi_index() and return the result.

It may be useful to have the original grid indices. Thus, we provide the grid_to_int_index() and int_to_grid_index() methods for converting between grid and integer indices.

As an example, the grid indices can be used to access boundaries of a measure value’s cell. For example, the following retrieves the lower and upper bounds of the cell along dimension 0:

# Access only element 0 since this method operates in batch.
idx = archive.int_to_grid_index(archive.index_of(...))[0]

lower = archive.boundaries[0][idx[0]]
upper = archive.boundaries[0][idx[0] + 1]

See boundaries for more info.

Parameters

measures (array-like) – (batch_size, measure_dim) array of coordinates in measure space.

Returns

(batch_size,) array of integer indices representing the flattened grid coordinates.

Return type

numpy.ndarray

Raises
index_of_single(measures)

Returns the index of the measures for one solution.

While index_of() takes in a batch of measures, this method takes in the measures for only one solution. If index_of() is implemented correctly, this method should work immediately (i.e. “out of the box”).

Parameters

measures (array-like) – (measure_dim,) array of measures for a single solution.

Returns

Integer index of the measures in the archive’s storage arrays.

Return type

int or numpy.integer

Raises
int_to_grid_index(int_indices)[source]

Converts a batch of indices into indices in the archive’s grid.

Refer to index_of() for more info.

Parameters

int_indices (array-like) – (batch_size,) array of integer indices such as those output by index_of().

Returns

(batch_size, measure_dim) array of indices in the archive grid.

Return type

numpy.ndarray

Raises

ValueErrorint_indices is not of shape (batch_size,).

retrieve(measures)

Retrieves the elites with measures in the same cells as the measures specified.

This method operates in batch, i.e., it takes in a batch of measures and outputs the batched data for the elites:

occupied, elites = archive.retrieve(...)
elites["solution"]  # Shape: (batch_size, solution_dim)
elites["objective"]
elites["measures"]
elites["threshold"]
elites["index"]

If the cell associated with elites["measures"][i] has an elite in it, then occupied[i] will be True. Furthermore, elites["solution"][i], elites["objective"][i], elites["measures"][i], elites["threshold"][i], and elites["index"][i] will be set to the properties of the elite. Note that elites["measures"][i] may not be equal to the measures[i] passed as an argument, since the measures only need to be in the same archive cell.

If the cell associated with measures[i] does not have any elite in it, then occupied[i] will be set to False. Furthermore, the corresponding outputs will be set to empty values – namely:

  • NaN for floating-point fields

  • -1 for the “index” field

  • 0 for integer fields

  • None for object fields

If you need to retrieve a single elite associated with some measures, consider using retrieve_single().

Parameters

measures (array-like) – (batch_size, measure_dim) array of coordinates in measure space.

Returns

2-element tuple of (occupied array, dict). The occupied array indicates whether each of the cells indicated by the coordinates in measures has an elite, while the dict contains the data of those elites. The dict maps from field name to the corresponding array.

Return type

tuple

Raises
retrieve_single(measures)

Retrieves the elite with measures in the same cell as the measures specified.

While retrieve() takes in a batch of measures, this method takes in the measures for only one solution and returns a single bool and a dict with single entries.

Parameters

measures (array-like) – (measure_dim,) array of measures.

Returns

If there is an elite with measures in the same cell as the measures specified, then this method returns a True value and a dict where all the fields hold the info of the elite. Otherwise, this method returns a False value and a dict filled with the same “empty” values described in retrieve().

Return type

tuple

Raises
sample_elites(n)

Randomly samples elites from the archive.

Currently, this sampling is done uniformly at random. Furthermore, each sample is done independently, so elites may be repeated in the sample. Additional sampling methods may be supported in the future.

Example

elites = archive.sample_elites(16)
elites["solution"]  # Shape: (16, solution_dim)
elites["objective"]
...
Parameters

n (int) – Number of elites to sample.

Returns

Holds a batch of elites randomly selected from the archive.

Return type

dict

Raises

IndexError – The archive is empty.

property best_elite

The elite with the highest objective in the archive.

None if there are no elites in the archive.

Note

If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.

Note

The best elite will contain a “threshold” key. This threshold is the threshold of the best elite’s cell after the best elite was inserted into the archive.

Type

dict

property boundaries

The boundaries of the cells in each dimension.

Entry i in this list is an array that contains the boundaries of the cells in dimension i. The array contains self.dims[i] + 1 entries laid out like this:

Archive cells:  | 0 | 1 |   ...   |    self.dims[i]    |
boundaries[i]:  0   1   2   self.dims[i] - 1     self.dims[i]

Thus, boundaries[i][j] and boundaries[i][j + 1] are the lower and upper bounds of cell j in dimension i. To access the lower bounds of all the cells in dimension i, use boundaries[i][:-1], and to access all the upper bounds, use boundaries[i][1:].

Type

list of numpy.ndarray

property cells

Total number of cells in the archive.

Type

int

property dims

Number of cells in each dimension.

Type

(measure_dim,) numpy.ndarray

property dtype

DEPRECATED.

property dtypes

Mapping from field name to dtype for all fields in the archive.

Type

dict

property empty

Whether the archive is empty.

Type

bool

property epsilon

Epsilon for computing archive indices. Refer to the documentation for this class.

Type

dtypes[“measures”]

property field_list

List of data fields in the archive.

Type

list

property interval_size

The size of each dim (upper_bounds - lower_bounds).

Type

(measure_dim,) numpy.ndarray

property learning_rate

The learning rate for threshold updates.

Type

float

property lower_bounds

Lower bound of each dimension.

Type

(measure_dim,) numpy.ndarray

property measure_dim

Dimensionality of the measure space.

Type

int

property qd_score_offset

The offset which is subtracted from objective values when computing the QD score.

Type

float

property solution_dim

Dimensionality of the solutions in the archive.

Type

int

property stats

Statistics about the archive.

See ArchiveStats for more info.

Type

ArchiveStats

property threshold_min

The initial threshold value for all the cells.

Type

float

property upper_bounds

Upper bound of each dimension.

Type

(measure_dim,) numpy.ndarray