ribs.archives.CVTArchive

class ribs.archives.CVTArchive(bins, ranges, seed=None, dtype=<class 'numpy.float64'>, samples=100000, custom_centroids=None, k_means_kwargs=None, use_kd_tree=False, ckdtree_kwargs=None)[source]

An archive that divides the entire behavior space into a fixed number of bins.

This archive originates in Vassiliades 2018. It uses Centroidal Voronoi Tesselation (CVT) to divide an n-dimensional behavior space into k bins. The CVT is created by sampling points uniformly from the n-dimensional behavior space and using k-means clustering to identify k centroids. When items are inserted into the archive, we identify their bin by identifying the closest centroid in behavior space (using Euclidean distance). For k-means clustering, we use sklearn.cluster.k_means().

Finding the closest centroid is done in O(bins) time (i.e. brute force) by default. If use_kd_tree is True, it can be done in roughly O(log bins) time using scipy.spatial.cKDTree. However, using the k-D tree lowers performance for small numbers of bins. The following plot compares the runtime of brute force vs k-D tree when inserting 100k samples into a 2D archive with varying numbers of bins (we took the minimum over 5 runs for each data point, as recommended in the docs for timeit.Timer.repeat(). Note the logarithmic scales. This plot was generated on a reasonably modern laptop.

Runtime to insert 100k entries into CVTArchive

Archives with at least 5k bins seem to have faster insertion when using a k-D tree than when using brute force, so we recommend setting use_kd_tree if the CVTArchive has at least 5k bins. See benchmarks/cvt_add.py in the project repo for more information about how this plot was generated.

Finally, if running multiple experiments, it may be beneficial to use the same centroids across each experiment. Doing so can keep experiments consistent and reduce execution time. To do this, either 1) construct custom centroids and pass them in via the custom_centroids argument, or 2) access the centroids created in the first archive with centroids and pass them into custom_centroids when constructing archives for subsequent experiments.

Parameters
  • bins (int) – The number of bins to use in the archive, equivalent to the number of centroids/areas in the CVT.

  • ranges (array-like of (float, float)) – Upper and lower bound of each dimension of the behavior space, e.g. [(-1, 1), (-2, 2)] indicates the first dimension should have bounds \([-1,1]\) (inclusive), and the second dimension should have bounds \([-2,2]\) (inclusive). ranges should be the same length as dims.

  • seed (int) – Value to seed the random number generator as well as k_means(). Set to None to avoid a fixed seed.

  • dtype (str or data-type) – Data type of the solutions, objective values, and behavior values. We only support "f" / np.float32 and "d" / np.float64.

  • samples (int or array-like) – If it is an int, this specifies the number of samples to generate when creating the CVT. Otherwise, this must be a (num_samples, behavior_dim) array where samples[i] is a sample to use when creating the CVT. It can be useful to pass in custom samples when there are restrictions on what samples in the behavior space are (physically) possible.

  • custom_centroids (array-like) – If passed in, this (bins, behavior_dim) array will be used as the centroids of the CVT instead of generating new ones. In this case, samples will be ignored, and archive.samples will be None. This can be useful when one wishes to use the same CVT across experiments for fair comparison.

  • k_means_kwargs (dict) – kwargs for k_means(). By default, we pass in n_init=1, init=”random”, algorithm=”full”, and random_state=seed.

  • use_kd_tree (bool) – If True, use a k-D tree for finding the closest centroid when inserting into the archive. This may result in a speedup for larger dimensions.

  • ckdtree_kwargs (dict) – kwargs for cKDTree. By default, we do not pass in any kwargs.

Raises

ValueError – The samples array or the custom_centroids array has the wrong shape.

Methods

__iter__()

Creates an iterator over the Elite’s in the archive.

__len__()

Number of elites in the archive.

add(solution, objective_value, behavior_values)

Attempts to insert a new solution into the archive.

as_pandas([include_solutions, include_metadata])

Converts the archive into an ArchiveDataFrame (a child class of pandas.DataFrame).

clear()

Removes all elites from the archive.

elite_with_behavior(behavior_values)

Gets the elite with behavior vals in the same bin as those specified.

get_index(behavior_values)

Finds the index of the centroid closest to the behavior values.

get_random_elite()

Selects an elite uniformly at random from one of the archive’s bins.

initialize(solution_dim)

Initializes the archive storage space and centroids.

Attributes

behavior_dim

Dimensionality of the behavior space.

bins

Total number of bins in the archive.

centroids

The centroids used in the CVT.

dtype

The dtype of the solutions, objective values, and behavior values.

empty

Whether the archive is empty.

initialized

Whether the archive has been initialized by a call to initialize()

lower_bounds

Lower bound of each dimension.

samples

The samples used in creating the CVT.

solution_dim

Dimensionality of the solutions in the archive.

stats

Statistics about the archive.

upper_bounds

Upper bound of each dimension.

__iter__()

Creates an iterator over the Elite’s in the archive.

Example

for elite in archive:
    elite.sol
    elite.obj
    ...
__len__()

Number of elites in the archive.

add(solution, objective_value, behavior_values, metadata=None)

Attempts to insert a new solution into the archive.

The solution is only inserted if it has a higher objective_value than the elite previously in the corresponding bin.

Parameters
  • solution (array-like) – Parameters of the solution.

  • objective_value (float) – Objective function evaluation of the solution.

  • behavior_values (array-like) – Coordinates in behavior space of the solution.

  • metadata (object) – Any Python object representing metadata for the solution. For instance, this could be a dict with several properties.

Returns

2-element tuple describing the result of the add operation. These outputs are particularly useful for algorithms such as CMA-ME.

status (AddStatus): See AddStatus.

value (dtype): The meaning of this value depends on the value of status:

  • NOT_ADDED -> the “negative improvement,” i.e. objective value of solution passed in minus objective value of the solution still in the archive (this value is negative because the solution did not have a high enough objective value to be added to the archive)

  • IMPROVE_EXISTING -> the “improvement,” i.e. objective value of solution passed in minus objective value of solution previously in the archive

  • NEW -> the objective value passed in

Return type

tuple

as_pandas(include_solutions=True, include_metadata=False)

Converts the archive into an ArchiveDataFrame (a child class of pandas.DataFrame).

The implementation of this method in ArchiveBase creates a dataframe consisting of:

  • len(self._storage_dims) columns for the index, named index_0, index_1, ... In GridArchive and SlidingBoundariesArchive, there are behavior_dim columns. In CVTArchive, there is just one column. See get_index() for more info.

  • behavior_dim columns for the behavior characteristics, named behavior_0, behavior_1, ...

  • 1 column for the objective values, named objective

  • solution_dim columns for the solution vectors, named solution_0, solution_1, ...

  • 1 column for the metadata objects, named metadata

In short, the dataframe looks like this:

index_0

behavior_0

objective

solution_0

metadata

Compared to pandas.DataFrame, the ArchiveDataFrame adds methods and attributes which make it easier to manipulate archive data. For more information, refer to the ArchiveDataFrame documentation.

Parameters
  • include_solutions (bool) – Whether to include solution columns.

  • include_metadata (bool) – Whether to include the metadata column. Note that methods like to_csv() may not properly save the dataframe since the metadata objects may not be representable in a CSV.

Returns

See above.

Return type

ArchiveDataFrame

clear()

Removes all elites from the archive.

After this method is called, the archive will be empty.

elite_with_behavior(behavior_values)

Gets the elite with behavior vals in the same bin as those specified.

Since Elite is a namedtuple, the result can be unpacked (here we show how to ignore some of the fields):

sol, obj, beh, *_ = archive.elite_with_behavior(...)

Or the fields may be accessed by name:

elite = archive.elite_with_behavior(...)
elite.sol
elite.obj
...
Parameters

behavior_values (array-like) – Coordinates in behavior space.

Returns

  • If there is an elite with behavior values in the same bin as those specified, this Elite holds the info for that elite. In that case, beh (the behavior values) may not be exactly the same as the behavior values specified since the elite is only guaranteed to be in the same archive bin.

  • If no such elite exists, then all fields of the Elite are set to None. This way, tuple unpacking (e.g. sol, obj, beh, idx, meta = archive.elite_with_behavior(...)) still works.

Return type

Elite

get_index(behavior_values)[source]

Finds the index of the centroid closest to the behavior values.

If idx is the index returned by this method for some behavior values beh, then archive.centroids[idx] holds the coordinates of the centroid closest to beh. See centroids for more info.

The centroid index is located using either the k-D tree or brute force, depending on the value of use_kd_tree in the constructor.

Parameters

behavior_values (numpy.ndarray) – (behavior_dim,) array of coordinates in behavior space.

Returns

Centroid index.

Return type

int

get_random_elite()

Selects an elite uniformly at random from one of the archive’s bins.

Since Elite is a namedtuple, the result can be unpacked (here we show how to ignore some of the fields):

sol, obj, beh, *_ = archive.get_random_elite()

Or the fields may be accessed by name:

elite = archive.get_random_elite()
elite.sol
elite.obj
...
Returns

A randomly selected elite from the archive.

Return type

Elite

Raises

IndexError – The archive is empty.

initialize(solution_dim)[source]

Initializes the archive storage space and centroids.

This method may take a while to run. In addition to allocating storage space, it runs k_means() to create an approximate CVT, and it constructs a cKDTree object containing the centroids found by k-means. k-means is not run if custom_centroids was passed in during construction.

Parameters

solution_dim (int) – The dimension of the solution space.

Raises
  • RuntimeError – The archive is already initialized.

  • RuntimeError – The number of centroids returned by k-means clustering was fewer than the number of bins specified during construction. This is most likely caused by having too few samples and too many bins.

property behavior_dim

Dimensionality of the behavior space.

Type

int

property bins

Total number of bins in the archive.

Type

int

property centroids

The centroids used in the CVT.

None until initialize() is called.

Type

(n_centroids, behavior_dim) numpy.ndarray

property dtype

The dtype of the solutions, objective values, and behavior values.

Type

data-type

property empty

Whether the archive is empty.

Type

bool

property initialized

Whether the archive has been initialized by a call to initialize()

property lower_bounds

Lower bound of each dimension.

Type

(behavior_dim,) numpy.ndarray

property samples

The samples used in creating the CVT.

May be None until initialize() is called.

Type

(num_samples, behavior_dim) numpy.ndarray

property solution_dim

Dimensionality of the solutions in the archive.

Type

int

property stats

Statistics about the archive.

See ArchiveStats for more info.

Type

ArchiveStats

property upper_bounds

Upper bound of each dimension.

Type

(behavior_dim,) numpy.ndarray