ribs.archives.CVTArchive

class ribs.archives.CVTArchive(*, solution_dim, cells, ranges, learning_rate=1.0, threshold_min=-inf, qd_score_offset=0.0, seed=None, dtype=<class 'numpy.float64'>, samples=100000, custom_centroids=None, k_means_kwargs=None, use_kd_tree=True, ckdtree_kwargs=None)[source]

An archive that divides the entire measure space into a fixed number of cells.

This archive originates in Vassiliades 2018. It uses Centroidal Voronoi Tessellation (CVT) to divide an n-dimensional measure space into k cells. The CVT is created by sampling points uniformly from the n-dimensional measure space and using k-means clustering to identify k centroids. When items are inserted into the archive, we identify their cell by identifying the closest centroid in measure space (using Euclidean distance). For k-means clustering, we use sklearn.cluster.k_means().

By default, finding the closest centroid is done in roughly O(log(number of cells)) time using scipy.spatial.cKDTree. To switch to brute force, which takes O(number of cells) time, pass use_kd_tree=False.

To compare the performance of using the k-D tree vs brute force, we ran benchmarks where we inserted 1k batches of 100 solutions into a 2D archive with varying numbers of cells. We took the minimum over 5 runs for each data point, as recommended in the docs for timeit.Timer.repeat(). Note the logarithmic scales. This plot was generated on a reasonably modern laptop.

Runtime to insert 100k entries into CVTArchive

Across almost all numbers of cells, using the k-D tree is faster than using brute force. Thus, we recommend always using the k-D tree. See benchmarks/cvt_add.py in the project repo for more information about how this plot was generated.

Finally, if running multiple experiments, it may be beneficial to use the same centroids across each experiment. Doing so can keep experiments consistent and reduce execution time. To do this, either (1) construct custom centroids and pass them in via the custom_centroids argument, or (2) access the centroids created in the first archive with centroids and pass them into custom_centroids when constructing archives for subsequent experiments.

Note

The idea of archive thresholds was introduced in Fontaine 2022. Refer to our CMA-MAE tutorial for more info on thresholds, including the learning_rate and threshold_min parameters.

Note

For more information on our choice of k-D tree implementation, see #38.

Parameters
  • solution_dim (int) – Dimension of the solution space.

  • cells (int) – The number of cells to use in the archive, equivalent to the number of centroids/areas in the CVT.

  • ranges (array-like of (float, float)) – Upper and lower bound of each dimension of the measure space, e.g. [(-1, 1), (-2, 2)] indicates the first dimension should have bounds \([-1,1]\) (inclusive), and the second dimension should have bounds \([-2,2]\) (inclusive). ranges should be the same length as dims.

  • learning_rate (float) – The learning rate for threshold updates.

  • threshold_min (float) – The initial threshold value for all the cells.

  • qd_score_offset (float) – Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as objective - (-300).

  • seed (int) – Value to seed the random number generator as well as k_means(). Set to None to avoid a fixed seed.

  • dtype (str or data-type) – Data type of the solutions, objectives, and measures. We only support "f" / np.float32 and "d" / np.float64.

  • samples (int or array-like) – If it is an int, this specifies the number of samples to generate when creating the CVT. Otherwise, this must be a (num_samples, measure_dim) array where samples[i] is a sample to use when creating the CVT. It can be useful to pass in custom samples when there are restrictions on what samples in the measure space are (physically) possible.

  • custom_centroids (array-like) – If passed in, this (cells, measure_dim) array will be used as the centroids of the CVT instead of generating new ones. In this case, samples will be ignored, and archive.samples will be None. This can be useful when one wishes to use the same CVT across experiments for fair comparison.

  • k_means_kwargs (dict) – kwargs for k_means(). By default, we pass in n_init=1, init=”random”, algorithm=”lloyd”, and random_state=seed.

  • use_kd_tree (bool) – If True, use a k-D tree for finding the closest centroid when inserting into the archive. If False, brute force will be used instead.

  • ckdtree_kwargs (dict) – kwargs for cKDTree. By default, we do not pass in any kwargs.

Raises

ValueError – The samples array or the custom_centroids array has the wrong shape.

Methods

__iter__()

Creates an iterator over the Elite's in the archive.

__len__()

Number of elites in the archive.

add(solution_batch, objective_batch, ...[, ...])

Inserts a batch of solutions into the archive.

add_single(solution, objective, measures[, ...])

Inserts a single solution into the archive.

as_pandas([include_solutions, include_metadata])

Converts the archive into an ArchiveDataFrame (a child class of pandas.DataFrame).

clear()

Removes all elites from the archive.

cqd_score(iterations, target_points, ...[, ...])

Computes the CQD score of the archive.

index_of(measures_batch)

Finds the indices of the centroid closest to the given coordinates in measure space.

index_of_single(measures)

Returns the index of the measures for one solution.

retrieve(measures_batch)

Retrieves the elites with measures in the same cells as the measures specified.

retrieve_single(measures)

Retrieves the elite with measures in the same cell as the measures specified.

sample_elites(n)

Randomly samples elites from the archive.

Attributes

best_elite

The elite with the highest objective in the archive.

cells

Total number of cells in the archive.

centroids

The centroids used in the CVT.

dtype

The dtype of the solutions, objective, and measures.

empty

Whether the archive is empty.

learning_rate

The learning rate for threshold updates.

lower_bounds

Lower bound of each dimension.

measure_dim

Dimensionality of the measure space.

qd_score_offset

The offset which is subtracted from objective values when computing the QD score.

samples

The samples used in creating the CVT.

solution_dim

Dimensionality of the solutions in the archive.

stats

Statistics about the archive.

threshold_min

The initial threshold value for all the cells.

upper_bounds

Upper bound of each dimension.

__iter__()

Creates an iterator over the Elite’s in the archive.

Example

for elite in archive:
    elite.sol
    elite.obj
    ...
__len__()

Number of elites in the archive.

add(solution_batch, objective_batch, measures_batch, metadata_batch=None)

Inserts a batch of solutions into the archive.

Each solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. If multiple solutions in the batch end up in the same cell, we only insert the solution with the highest objective. If multiple solutions end up in the same cell and tie for the highest objective, we insert the solution that appears first in the batch.

For the default values of learning_rate and threshold_min, the threshold for each cell is updated by taking the maximum objective value among all the solutions that landed in the cell, resulting in the same behavior as in the vanilla MAP-Elites archive. However, for other settings, the threshold is updated with the batch update rule described in the appendix of Fontaine 2022.

Note

The indices of all arguments should “correspond” to each other, i.e. solution_batch[i], objective_batch[i], measures_batch[i], and metadata_batch[i] should be the solution parameters, objective, measures, and metadata for solution i.

Parameters
  • solution_batch (array-like) – (batch_size, solution_dim) array of solution parameters.

  • objective_batch (array-like) – (batch_size,) array with objective function evaluations of the solutions.

  • measures_batch (array-like) – (batch_size, measure_dim) array with measure space coordinates of all the solutions.

  • metadata_batch (array-like) –

    (batch_size,) array of Python objects representing metadata for the solution. For instance, this could be a dict with several properties.

    Warning

    Due to how NumPy’s asarray() automatically converts array-like objects to arrays, passing array-like objects as metadata may lead to unexpected behavior. However, the metadata may be a dict or other object which contains arrays, i.e. metadata_batch could be an array of dicts which contain arrays.

Returns

2-element tuple of (status_batch, value_batch) which describes the results of the additions. These outputs are particularly useful for algorithms such as CMA-ME.

  • status_batch (numpy.ndarray of int): An array of integers which represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:

    • 0: The solution was not added to the archive.

    • 1: The solution improved the objective value of a cell which was already in the archive.

    • 2: The solution discovered a new cell in the archive.

    All statuses (and values, below) are computed with respect to the current archive. For example, if two solutions both introduce the same new archive cell, then both will be marked with 2.

    The alternative is to depend on the order of the solutions in the batch – for example, if we have two solutions a and b which introduce the same new cell in the archive, a could be inserted first with status 2, and b could be inserted second with status 1 because it improves upon a. However, our implementation does not do this.

    To convert statuses to a more semantic format, cast all statuses to AddStatus e.g. with [AddStatus(s) for s in status_batch].

  • value_batch (dtype): An array with values for each solution in the batch. With the default values of learning_rate = 1.0 and threshold_min = -np.inf, the meaning of each value depends on the corresponding status and is identical to that in CMA-ME (Fontaine 2020):

    • 0 (not added): The value is the “negative improvement,” i.e. the objective of the solution passed in minus the objective of the elite still in the archive (this value is negative because the solution did not have a high enough objective to be added to the archive).

    • 1 (improve existing cell): The value is the “improvement,” i.e. the objective of the solution passed in minus the objective of the elite previously in the archive.

    • 2 (new cell): The value is just the objective of the solution.

    In contrast, for other values of learning_rate and threshold_min, each value is equivalent to the objective value of the solution minus the threshold of its corresponding cell in the archive.

Return type

tuple

Raises
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective_batch or measures_batch has non-finite values (inf or NaN).

add_single(solution, objective, measures, metadata=None)

Inserts a single solution into the archive.

The solution is only inserted if it has a higher objective than the threshold of the corresponding cell. For the default values of learning_rate and threshold_min, this threshold is simply the objective value of the elite previously in the cell. The threshold is also updated if the solution was inserted.

Note

To make it more amenable to modifications, this method’s implementation is designed to be readable at the cost of performance, e.g., none of its operations are modified. If you need performance, we recommend using add().

Parameters
  • solution (array-like) – Parameters of the solution.

  • objective (float) – Objective function evaluation of the solution.

  • measures (array-like) – Coordinates in measure space of the solution.

  • metadata (object) –

    Python object representing metadata for the solution. For instance, this could be a dict with several properties.

    Warning

    Due to how NumPy’s asarray() automatically converts array-like objects to arrays, passing array-like objects as metadata may lead to unexpected behavior. However, the metadata may be a dict or other object which contains arrays.

Raises
  • ValueError – The array arguments do not match their specified shapes.

  • ValueErrorobjective is non-finite (inf or NaN) or measures has non-finite values.

Returns

2-element tuple of (status, value) describing the result of the add operation. Refer to add() for the meaning of the status and value.

Return type

tuple

as_pandas(include_solutions=True, include_metadata=False)

Converts the archive into an ArchiveDataFrame (a child class of pandas.DataFrame).

The implementation of this method in ArchiveBase creates a dataframe consisting of:

  • 1 column of integers (np.int32) for the index, named index. See index_of() for more info.

  • measure_dim columns for the measures, named measure_0, measure_1, ...

  • 1 column for the objectives, named objective

  • solution_dim columns for the solution parameters, named solution_0, solution_1, ...

  • 1 column for the metadata objects, named metadata

In short, the dataframe looks like this:

index

measure_0

objective

solution_0

metadata

Compared to pandas.DataFrame, the ArchiveDataFrame adds methods and attributes which make it easier to manipulate archive data. For more information, refer to the ArchiveDataFrame documentation.

Parameters
  • include_solutions (bool) – Whether to include solution columns.

  • include_metadata (bool) – Whether to include the metadata column. Note that methods like to_csv() may not properly save the dataframe since the metadata objects may not be representable in a CSV.

Returns

See above.

Return type

ArchiveDataFrame

clear()

Removes all elites from the archive.

After this method is called, the archive will be empty.

cqd_score(iterations, target_points, penalties, obj_min, obj_max, dist_max=None, dist_ord=None)

Computes the CQD score of the archive.

The Continuous Quality Diversity (CQD) score was introduced in Kent 2022.

Note

This method by default assumes that the archive has an upper_bounds and lower_bounds property which delineate the bounds of the measure space, as is the case in GridArchive, CVTArchive, and SlidingBoundariesArchive. If this is not the case, dist_max must be passed in, and target_points must be an array of custom points.

Parameters
  • iterations (int) – Number of times to compute the CQD score. We return the mean CQD score across these iterations.

  • target_points (int or array-like) – Number of target points to generate, or an (iterations, n, measure_dim) array which lists n target points to list on each iteration. When an int is passed, the points are sampled uniformly within the bounds of the measure space.

  • penalties (int or array-like) – Number of penalty values over which to compute the score (the values are distributed evenly over the range [0,1]). Alternatively, this may be a 1D array which explicitly lists the penalty values. Known as \(\theta\) in Kent 2022.

  • obj_min (float) – Minimum objective value, used when normalizing the objectives.

  • obj_max (float) – Maximum objective value, used when normalizing the objectives.

  • dist_max (float) – Maximum distance between points in measure space. Defaults to the distance between the extremes of the measure space bounds (the type of distance is computed with the order specified by dist_ord). Known as \(\delta_{max}\) in Kent 2022.

  • dist_ord – Order of the norm to use for calculating measure space distance; this is passed to numpy.linalg.norm() as the ord argument. See numpy.linalg.norm() for possible values. The default is to use Euclidean distance (L2 norm).

Returns

The mean CQD score obtained with iterations rounds of calculations.

Raises
  • RuntimeError – The archive does not have the bounds properties mentioned above, and dist_max is not specified or the target points are not provided.

  • ValueError – target_points or penalties is an array with the wrong shape.

index_of(measures_batch)[source]

Finds the indices of the centroid closest to the given coordinates in measure space.

If index_batch is the batch of indices returned by this method, then archive.centroids[index_batch[i]] holds the coordinates of the centroid closest to measures_batch[i]. See centroids for more info.

The centroid indices are located using either the k-D tree or brute force, depending on the value of use_kd_tree in the constructor.

Parameters

measures_batch (array-like) – (batch_size, measure_dim) array of coordinates in measure space.

Returns

(batch_size,) array of centroid indices corresponding to each measure space coordinate.

Return type

numpy.ndarray

Raises
index_of_single(measures)

Returns the index of the measures for one solution.

While index_of() takes in a batch of measures, this method takes in the measures for only one solution. If index_of() is implemented correctly, this method should work immediately (i.e. “out of the box”).

Parameters

measures (array-like) – (measure_dim,) array of measures for a single solution.

Returns

Integer index of the measures in the archive’s storage arrays.

Return type

int or numpy.integer

Raises
retrieve(measures_batch)

Retrieves the elites with measures in the same cells as the measures specified.

This method operates in batch, i.e. it takes in a batch of measures and outputs an EliteBatch. Since EliteBatch is a namedtuple, it can be unpacked:

solution_batch, objective_batch, measures_batch, \
    index_batch, metadata_batch = archive.retrieve(...)

Or the fields may be accessed by name:

elite_batch = archive.retrieve(...)
elite_batch.solution_batch
elite_batch.objective_batch
elite_batch.measures_batch
elite_batch.index_batch
elite_batch.metadata_batch

If the cell associated with measures_batch[i] has an elite in it, then elite_batch.solution_batch[i], elite_batch.objective_batch[i], elite_batch.measures_batch[i], elite_batch.index_batch[i], and elite_batch.metadata_batch[i] will be set to the properties of the elite. Note that elite_batch.measures_batch[i] may not be equal to measures_batch[i] since the measures only need to be in the same archive cell.

If the cell associated with measures_batch[i] does not have any elite in it, then the corresponding outputs are set to empty values – namely:

  • elite_batch.solution_batch[i] will be an array of NaN

  • elite_batch.objective_batch[i] will be NaN

  • elite_batch.measures_batch[i] will be an array of NaN

  • elite_batch.index_batch[i] will be -1

  • elite_batch.metadata_batch[i] will be None

If you need to retrieve a single elite associated with some measures, consider using retrieve_single().

Parameters

measures_batch (array-like) – (batch_size, measure_dim) array of coordinates in measure space.

Returns

See above.

Return type

EliteBatch

Raises
retrieve_single(measures)

Retrieves the elite with measures in the same cell as the measures specified.

While retrieve() takes in a batch of measures, this method takes in the measures for only one solution and returns a single Elite.

Parameters

measures (array-like) – (measure_dim,) array of measures.

Returns

If there is an elite with measures in the same cell as the measures specified, then this method returns an Elite where all the fields hold the info of that elite. Otherwise, this method returns an Elite filled with the same “empty” values described in retrieve().

Raises
sample_elites(n)

Randomly samples elites from the archive.

Currently, this sampling is done uniformly at random. Furthermore, each sample is done independently, so elites may be repeated in the sample. Additional sampling methods may be supported in the future.

Since EliteBatch is a namedtuple, the result can be unpacked (here we show how to ignore some of the fields):

solution_batch, objective_batch, measures_batch, *_ = \
    archive.sample_elites(32)

Or the fields may be accessed by name:

elite = archive.sample_elites(16)
elite.solution_batch
elite.objective_batch
...
Parameters

n (int) – Number of elites to sample.

Returns

A batch of elites randomly selected from the archive.

Return type

EliteBatch

Raises

IndexError – The archive is empty.

property best_elite

The elite with the highest objective in the archive.

None if there are no elites in the archive.

Note

If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.

Type

Elite

property cells

Total number of cells in the archive.

Type

int

property centroids

The centroids used in the CVT.

Type

(n_centroids, measure_dim) numpy.ndarray

property dtype

The dtype of the solutions, objective, and measures.

Type

data-type

property empty

Whether the archive is empty.

Type

bool

property learning_rate

The learning rate for threshold updates.

Type

float

property lower_bounds

Lower bound of each dimension.

Type

(measure_dim,) numpy.ndarray

property measure_dim

Dimensionality of the measure space.

Type

int

property qd_score_offset

The offset which is subtracted from objective values when computing the QD score.

Type

float

property samples

The samples used in creating the CVT.

Will be None if custom centroids were passed in to the archive.

Type

(num_samples, measure_dim) numpy.ndarray

property solution_dim

Dimensionality of the solutions in the archive.

Type

int

property stats

Statistics about the archive.

See ArchiveStats for more info.

Type

ArchiveStats

property threshold_min

The initial threshold value for all the cells.

Type

float

property upper_bounds

Upper bound of each dimension.

Type

(measure_dim,) numpy.ndarray