ribs.archives.CategoricalArchive

class ribs.archives.CategoricalArchive(*, solution_dim: int | integer | tuple[int | integer, ...], categories: Collection[Collection[Hashable]], learning_rate: float | floating | None = None, threshold_min: float | floating = -inf, qd_score_offset: float | floating = 0.0, seed: int | integer | None = None, solution_dtype: numpy.typing.DTypeLike = None, objective_dtype: numpy.typing.DTypeLike = None, measures_dtype: numpy.typing.DTypeLike = None, dtype: numpy.typing.DTypeLike = None, extra_fields: dict[str, tuple[int | integer | tuple[int | integer, ...], DTypeLike]] | None = None)[source]

An archive where each dimension is divided into categories.

This archive is similar to a GridArchive, except that each measure is a categorical variable. Just like GridArchive, it can be visualized as an n-dimensional grid in the measure space that is divided into cells along each dimension. Each cell contains an elite, i.e., a solution that maximizes the objective function and has measures that lie within that cell. This archive also implements the idea of soft archives that have thresholds, as introduced in Fontaine 2023.

By default, this archive stores the following data fields: solution, objective, measures, threshold, and index. The threshold is the value that a solution’s objective value must exceed to be inserted into a cell, while the integer index uniquely identifies each cell.

Parameters:
solution_dim: int | integer | tuple[int | integer, ...]

Dimensionality of the solution space. Scalar or multi-dimensional solution shapes are allowed by passing an empty tuple or tuple of integers, respectively.

categories: Collection[Collection[Hashable]]

The name of each category for each dimension of the measure space. The length of this list is the dimensionality of the measure space. An example is [["A", "B", "C"], ["One", "Two", "Three", "Four"]], which defines a 2D measure space where the first dimension has categories ["A", "B", "C"] and the second has categories ["One", "Two", "Three", "Four"]. While any object can be used for the category name, strings are expected to be the typical use case.

learning_rate: float | floating | None = None

The learning rate for threshold updates. Defaults to 1.0.

threshold_min: float | floating = -inf

The initial threshold value for all the cells.

qd_score_offset: float | floating = 0.0

Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as objective - (-300).

seed: int | integer | None = None

Value to seed the random number generator. Set to None to avoid a fixed seed.

solution_dtype: numpy.typing.DTypeLike = None

Data type of the solutions. Defaults to float64.

objective_dtype: numpy.typing.DTypeLike = None

Data type of the objectives. Defaults to float64.

measures_dtype: numpy.typing.DTypeLike = None

Data type of the measures. Defaults to object.

dtype: numpy.typing.DTypeLike = None

Shortcut for providing data type of the solutions and objectives. Defaults to float64 (numpy’s default floating point type). This parameter sets the two dtypes simultaneously and does not set the dtype for the measures. To set individual dtypes, pass solution_dtype, objective_dtype, and measures_dtype. Note that dtype cannot be used at the same time as those parameters.

extra_fields: dict[str, tuple[int | integer | tuple[int | integer, ...], DTypeLike]] | None = None

Description of extra fields of data that are stored next to elite data like solutions and objectives. The description is a dict mapping from a field name (str) to a tuple of (shape, dtype). For instance, {"foo": ((), np.float32), "bar": ((10,), np.float32)} will create a “foo” field that contains scalar values and a “bar” field that contains 10D values. Note that field names must be valid Python identifiers, and names already used in the archive are not allowed.

Raises:
  • ValueError – Invalid values for learning_rate and threshold_min.

  • ValueError – Invalid names in extra_fields.

Methods

__iter__()

Creates an iterator over the elites in the archive.

__len__()

Number of elites in the archive.

add(solution, objective, measures, **fields)

Inserts a batch of solutions into the archive.

add_single(solution, objective, measures, ...)

Inserts a single solution into the archive.

clear()

Removes all elites in the archive.

data()

Returns data of the elites in the archive.

grid_to_int_index(grid_indices)

Converts a batch of grid indices into a batch of integer indices.

index_of(measures)

Returns archive indices for the given batch of measures.

index_of_single(measures)

Returns the index of the measures for one solution.

int_to_grid_index(int_indices)

Converts a batch of indices into indices in the archive's grid.

retrieve(measures)

Queries the archive for elites with the given batch of measures.

retrieve_single(measures)

Queries the archive for an elite with the given measures.

sample_elites(n[, replace])

Randomly samples elites from the archive.

Attributes

best_elite

The elite with the highest objective in the archive.

categories

The categories in each dimension of the measure space.

cells

Total number of cells in the archive.

dims

(measure_dim,) array listing the number of cells in each dimension.

dtypes

Mapping from field name to dtype for all fields in the archive.

empty

Whether the archive is empty.

field_list

List of data fields in the archive.

learning_rate

The learning rate for threshold updates.

measure_dim

Dimensionality of the measure space.

objective_dim

Dimensionality of the objective space.

qd_score_offset

Subtracted from objective values when computing the QD score.

solution_dim

Dimensionality of the solution space.

stats

Statistics about the archive.

threshold_min

The initial threshold value for all the cells.

add(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, **fields: numpy.typing.ArrayLike) dict[str, ndarray][source]

Inserts a batch of solutions into the archive.

Each solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. If multiple solutions in the batch end up in the same cell, we only insert the solution with the highest objective. If multiple solutions end up in the same cell and tie for the highest objective, we insert the solution that appears first in the batch.

For the default values of learning_rate and threshold_min, the threshold for each cell is updated by taking the maximum objective value among all the solutions that landed in the cell, resulting in the same behavior as in the vanilla MAP-Elites archive. However, for other settings, the threshold is updated with the batch update rule described in the appendix of Fontaine 2023.

Note

The indices of all arguments should “correspond” to each other, i.e., solution[i], objective[i], and measures[i] should be the solution parameters, objective, and measures for solution i.

Parameters:
solution: numpy.typing.ArrayLike

(batch_size, solution_dim) array of solution parameters.

objective: numpy.typing.ArrayLike

(batch_size,) array with objective function evaluations of the solutions.

measures: numpy.typing.ArrayLike

(batch_size, measure_dim) array with measure space coordinates of all the solutions.

**fields: numpy.typing.ArrayLike

Additional data for each solution. Each argument should be an array with batch_size as the first dimension.

Returns:

Information describing the result of the add operation. The dict contains the following keys:

  • "status" (numpy.ndarray of numpy.int32): An array of integers that represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:

    • 0: The solution was not added to the archive.

    • 1: The solution improved the objective value of a cell which was already in the archive.

    • 2: The solution discovered a new cell in the archive.

    All statuses (and values, below) are computed with respect to the current archive. For example, if two solutions both introduce the same new archive cell, then both will be marked with 2.

    The alternative is to depend on the order of the solutions in the batch – for example, if we have two solutions a and b that introduce the same new cell in the archive, a could be inserted first with status 2, and b could be inserted second with status 1 because it improves upon a. However, our implementation does not do this.

    To convert statuses to a more semantic format, cast all statuses to AddStatus, e.g., with [AddStatus(s) for s in add_info["status"]].

  • "value" (numpy.ndarray of dtypes [“objective”]): An array with values for each solution in the batch. With the default values of learning_rate = 1.0 and threshold_min = -np.inf, the meaning of each value depends on the corresponding status and is identical to that in CMA-ME (Fontaine 2020):

    • 0 (not added): The value is the “negative improvement,” i.e., the objective of the solution passed in minus the objective of the elite still in the archive (this value is negative because the solution did not have a high enough objective to be added to the archive).

    • 1 (improve existing cell): The value is the “improvement,” i.e., the objective of the solution passed in minus the objective of the elite previously in the archive.

    • 2 (new cell): The value is just the objective of the solution.

    In contrast, for other values of learning_rate and threshold_min, each value is equivalent to the objective value of the solution minus the threshold of its corresponding cell in the archive.

Raises:
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective or measures has non-finite values (inf or NaN).

add_single(solution: numpy.typing.ArrayLike, objective: numpy.typing.ArrayLike, measures: numpy.typing.ArrayLike, **fields: numpy.typing.ArrayLike) dict[str, Any][source]

Inserts a single solution into the archive.

The solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. The threshold is also updated if the solution was inserted.

Note

This method is provided as an easier-to-understand implementation that has less performance due to inserting only one solution at a time. For better performance, see add().

Parameters:
solution: numpy.typing.ArrayLike

Parameters of the solution.

objective: numpy.typing.ArrayLike

Objective function evaluation of the solution.

measures: numpy.typing.ArrayLike

Coordinates in measure space of the solution.

**fields: numpy.typing.ArrayLike

Additional data for the solution.

Returns:

Information describing the result of the add operation. The dict contains status and value keys; refer to add() for the meaning of status and value.

Raises:
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective is non-finite (inf or NaN) or measures has non-finite values.

clear() None[source]

Removes all elites in the archive.

data(fields: str, return_type: 'dict' | 'tuple' | 'pandas' = 'dict') ndarray[source]
data(fields: None | Collection[str] = None, return_type: 'dict' = 'dict') dict[str, ndarray]
data(fields: None | Collection[str] = None, return_type: 'tuple' = 'tuple') tuple[ndarray]
data(fields: None | Collection[str] = None, return_type: 'pandas' = 'pandas') ArchiveDataFrame

Returns data of the elites in the archive.

Parameters:
fields: str
fields: None | Collection[str] = None

List of fields to include, such as "solution", "objective", "measures", and other fields in the archive. This can also be a single str indicating a field name.

return_type: 'dict' | 'tuple' | 'pandas' = 'dict'
return_type: 'dict' = 'dict'
return_type: 'tuple' = 'tuple'
return_type: 'pandas' = 'pandas'

Data to return; see below. Ignored if fields is a str.

Returns:

The data for all elites in the archive. All data returned by this method will be a copy, i.e., the data will not update as the archive changes. If fields was a single str, the returned data will just be an array holding data for the given field, such as:

measures = archive.data("measures")

Otherwise, the returned data can take the following forms, depending on the return_type argument:

  • return_type="dict": Dict mapping from the field name to the field data at the given indices. An example is:

    {
      "solution": [[1.0, 1.0, ...], ...],
      "objective": [1.5, ...],
      "measures": [[1.0, 2.0], ...],
      ...
    }
    

    The keys in this dict can be modified with the fields arg; duplicate fields will be ignored since the dict stores unique keys.

  • return_type="tuple": Tuple of arrays matching the field order in fields. For instance, if fields is ["objective", "measures"], this method would return a tuple of (objective_arr, measures_arr) that could be unpacked as:

    objective, measures = archive.data(["objective", "measures"],
                                       return_type="tuple")
    

    Unlike with the dict return type, duplicate fields will show up as duplicate entries in the tuple, e.g., fields=["objective", "objective"] will result in two objective arrays being returned.

    When fields=None (the default case), the fields in the tuple will be ordered according to the field_list.

  • return_type="pandas": An ArchiveDataFrame with the following columns:

    • For fields that are scalars, a single column with the field name. For example, objective would have a single column called objective.

    • For fields that are 1D arrays, multiple columns with the name suffixed by its index. To illustrate, for a measures field of length 10, the dataframe would contain 10 columns with names measures_0, measures_1, …, measures_9. The output format for fields with >1D data is currently not defined.

    In short, the dataframe might look like this by default:

    solution_0

    objective

    measures_0

    Like the other return types, the columns returned can be adjusted with the fields parameter.

Raises:
  • ValueError – Invalid field name provided.

  • ValueError – Invalid return_type provided.

  • ValueError – Passed return_type="pandas" when one of the fields has >1D data.

grid_to_int_index(grid_indices: numpy.typing.ArrayLike) ndarray

Converts a batch of grid indices into a batch of integer indices.

Refer to index_of() for more info.

Parameters:
grid_indices: numpy.typing.ArrayLike

(batch_size, measure_dim) array of indices in the archive grid.

Returns:

(batch_size,) array of integer indices.

Raises:

ValueErrorgrid_indices is not of shape (batch_size, measure_dim).

index_of(measures: numpy.typing.ArrayLike) ndarray[source]

Returns archive indices for the given batch of measures.

This is by done by mapping from the category name to the cell indices, and then converting to integer indices with grid_to_int_index().

Parameters:
measures: numpy.typing.ArrayLike

(batch_size, measure_dim) array of coordinates/categories in measure space.

Returns:

(batch_size,) array of integer indices representing the flattened grid coordinates.

Raises:
index_of_single(measures: numpy.typing.ArrayLike) int | integer[source]

Returns the index of the measures for one solution.

See index_of().

Parameters:
measures: numpy.typing.ArrayLike

(measure_dim,) array of measures for a single solution.

Returns:

Integer index of the measures in the archive’s storage arrays.

Raises:
int_to_grid_index(int_indices: numpy.typing.ArrayLike) ndarray

Converts a batch of indices into indices in the archive’s grid.

Refer to index_of() for more info.

Parameters:
int_indices: numpy.typing.ArrayLike

(batch_size,) array of integer indices such as those output by index_of().

Returns:

(batch_size, measure_dim) array of indices in the archive grid.

Raises:

ValueErrorint_indices is not of shape (batch_size,).

retrieve(measures: numpy.typing.ArrayLike) tuple[ndarray, dict[str, ndarray]][source]

Queries the archive for elites with the given batch of measures.

This method operates in batch. It takes in a batch of measures and outputs the batched data for the elites:

occupied, elites = archive.retrieve(...)
occupied  # Shape: (batch_size,)
elites["solution"]  # Shape: (batch_size, solution_dim)
elites["objective"]  # Shape: (batch_size, objective_dim)
elites["measures"]  # Shape: (batch_size, measure_dim)
...

occupied indicates whether an elite was found for each measure, i.e., whether the archive was occupied at each queried measure. If occupied[i] is True, then elites["solution"][i], elites["objective"][i], elites["measures"][i], and other fields will contain the data of the elite for the input measures[i]. If occupied[i] is False, then those fields will instead have arbitrary values, e.g., elites["solution"][i] may be set to all NaN.

Parameters:
measures: numpy.typing.ArrayLike

(batch_size, measure_dim) array of measure space points at which to retrieve solutions.

Returns:

2-element tuple of (boolean occupied array, dict of elite data). See above for description.

Raises:
retrieve_single(measures: numpy.typing.ArrayLike) tuple[bool, dict[str, Any]][source]

Queries the archive for an elite with the given measures.

While retrieve() takes in a batch of measures, this method takes in the measures for only one solution and returns a single bool and a dict with single entries:

occupied, elite = archive.retrieve_single(...)
occupied  # Bool
elite["solution"]  # Shape: (solution_dim,)
elite["objective"]  # Shape: (objective_dim,)
elite["measures"]  # Shape: (measure_dim,)
...
Parameters:
measures: numpy.typing.ArrayLike

(measure_dim,) array of measures.

Returns:

2-element tuple of (boolean, dict of data for one elite)

Raises:
sample_elites(n: int | integer, replace: bool = True) dict[str, ndarray][source]

Randomly samples elites from the archive.

Currently, this sampling is done uniformly at random, either with or without replacement. Additional sampling methods may be supported in the future.

Example

elites = archive.sample_elites(16)
elites["solution"]  # Shape: (16, solution_dim)
elites["objective"]
elites["measures"]
...
Parameters:
n: int | integer

Number of elites to sample.

replace: bool = True

Whether to replace the elites when sampling. If True, the elites will be replaced and thus will be sampled independently.

Returns:

A batch of elites randomly selected from the archive.

Raises:
  • IndexError – The archive is empty.

  • ValueErrorn was greater than the number of elites in the archive when replace=False.

property best_elite : dict[str, Any]

The elite with the highest objective in the archive.

None if there are no elites in the archive.

Note

If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.

Note

The best elite will contain a “threshold” key. This threshold is the threshold of the best elite’s cell after the best elite was inserted into the archive.

property categories : list[list[Hashable]]

The categories in each dimension of the measure space.

property cells : int | integer

Total number of cells in the archive.

property dims : ndarray

(measure_dim,) array listing the number of cells in each dimension.

property dtypes : dict[str, dtype]

Mapping from field name to dtype for all fields in the archive.

property empty : bool

Whether the archive is empty.

property field_list : list[str]

List of data fields in the archive.

property learning_rate : float

The learning rate for threshold updates.

property measure_dim : int | integer

Dimensionality of the measure space.

property objective_dim : tuple[()] | int | integer

Dimensionality of the objective space.

The empty tuple () indicates a scalar objective.

property qd_score_offset : float

Subtracted from objective values when computing the QD score.

property solution_dim : int | integer | tuple[int | integer, ...]

Dimensionality of the solution space.

property stats : ArchiveStats

Statistics about the archive.

See ArchiveStats for more info.

property threshold_min : float

The initial threshold value for all the cells.