ribs.archives.ArchiveBase¶
- class ribs.archives.ArchiveBase(*, solution_dim, cells, measure_dim, learning_rate=1.0, threshold_min=-inf, qd_score_offset=0.0, seed=None, dtype=<class 'numpy.float64'>)[source]¶
Base class for archives.
This class assumes all archives use a fixed-size container with cells that hold (1) information about whether the cell is occupied (bool), (2) a solution (1D array), (3) objective function evaluation of the solution (float), (4) measure space coordinates of the solution (1D array), (5) any additional metadata associated with the solution (object), and (6) a threshold which determines how high an objective value must be for a solution to be inserted into a cell (float). In this class, the container is implemented with separate numpy arrays that share common dimensions. Using the
solution_dim
,cells`, and ``measure_dim
arguments in__init__
, these arrays are as follows:Name
Shape
_occupied_arr
(cells,)
_solution_arr
(cells, solution_dim)
_objective_arr
(cells,)
_measures_arr
(cells, measure_dim)
_metadata_arr
(cells,)
_threshold_arr
(cells,)
All of these arrays are accessed via a common integer index. If we have index
i
, we access its solution at_solution_arr[i]
, its measure values at_measures_arr[i]
, etc.Thus, child classes typically override the following methods:
__init__
: Child classes must invoke this class’s__init__
with the appropriate arguments.index_of()
: Returns integer indices into the arrays above when given a batch of measures. Usually, each index has a meaning, e.g. inCVTArchive
it is the index of a centroid. Documentation for this method should describe the meaning of the index.
Note
Attributes beginning with an underscore are only intended to be accessed by child classes (i.e. they are “protected” attributes).
Note
The idea of archive thresholds was introduced in Fontaine 2022. Refer to our CMA-MAE tutorial for more info on thresholds, including the
learning_rate
andthreshold_min
parameters.- Parameters
solution_dim (int) – Dimension of the solution space.
cells (int) – Number of cells in the archive. This is used to create the numpy arrays described above for storing archive info.
measure_dim (int) – The dimension of the measure space.
learning_rate (float) – The learning rate for threshold updates.
threshold_min (float) – The initial threshold value for all the cells.
qd_score_offset (float) – Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as
objective - (-300)
.seed (int) – Value to seed the random number generator. Set to None to avoid a fixed seed.
dtype (str or data-type) – Data type of the solutions, objectives, and measures. We only support
"f"
/np.float32
and"d"
/np.float64
.
- Variables
_solution_dim (int) – See
solution_dim
arg._rng (numpy.random.Generator) – Random number generator, used in particular for generating random elites.
_cells (int) – See
cells
arg._measure_dim (int) – See
measure_dim
arg._occupied_arr (numpy.ndarray) – Bool array storing whether each cell in the archive is occupied.
_solution_arr (numpy.ndarray) – Float array storing the solutions themselves.
_objective_arr (numpy.ndarray) – Float array storing the objective value of each solution.
_measures_arr (numpy.ndarray) – Float array storing the measure space coordinates of each solution.
_metadata_arr (numpy.ndarray) – Object array storing the metadata associated with each solution.
_threshold_arr (numpy.ndarray) – Float array storing the threshold for insertion into each cell.
_occupied_indices (numpy.ndarray) – A
(cells,)
array of integer (np.int32
) indices that are occupied in the archive. This could be a list, but for efficiency, we make it a fixed-size array, where only the first_num_occupied
entries are valid._num_occupied (int) – Number of elites currently in the archive. This is used to index into
_occupied_indices
.
Methods
__iter__
()Creates an iterator over the
Elite
's in the archive.__len__
()Number of elites in the archive.
add
(solution_batch, objective_batch, ...[, ...])Inserts a batch of solutions into the archive.
add_single
(solution, objective, measures[, ...])Inserts a single solution into the archive.
as_pandas
([include_solutions, include_metadata])Converts the archive into an
ArchiveDataFrame
(a child class ofpandas.DataFrame
).clear
()Removes all elites from the archive.
cqd_score
(iterations, target_points, ...[, ...])Computes the CQD score of the archive.
index_of
(measures_batch)Returns archive indices for the given batch of measures.
index_of_single
(measures)Returns the index of the measures for one solution.
retrieve
(measures_batch)Retrieves the elites with measures in the same cells as the measures specified.
retrieve_single
(measures)Retrieves the elite with measures in the same cell as the measures specified.
Randomly samples elites from the archive.
Attributes
The elite with the highest objective in the archive.
Total number of cells in the archive.
The dtype of the solutions, objective, and measures.
Whether the archive is empty.
The learning rate for threshold updates.
Dimensionality of the measure space.
The offset which is subtracted from objective values when computing the QD score.
Dimensionality of the solutions in the archive.
Statistics about the archive.
The initial threshold value for all the cells.
- __iter__()[source]¶
Creates an iterator over the
Elite
’s in the archive.Example
for elite in archive: elite.sol elite.obj ...
- add(solution_batch, objective_batch, measures_batch, metadata_batch=None)[source]¶
Inserts a batch of solutions into the archive.
Each solution is only inserted if it has a higher
objective
than the threshold of the corresponding cell. For the default values oflearning_rate
andthreshold_min
, this threshold is simply the objective value of the elite previously in the cell. If multiple solutions in the batch end up in the same cell, we only insert the solution with the highest objective. If multiple solutions end up in the same cell and tie for the highest objective, we insert the solution that appears first in the batch.For the default values of
learning_rate
andthreshold_min
, the threshold for each cell is updated by taking the maximum objective value among all the solutions that landed in the cell, resulting in the same behavior as in the vanilla MAP-Elites archive. However, for other settings, the threshold is updated with the batch update rule described in the appendix of Fontaine 2022.Note
The indices of all arguments should “correspond” to each other, i.e.
solution_batch[i]
,objective_batch[i]
,measures_batch[i]
, andmetadata_batch[i]
should be the solution parameters, objective, measures, and metadata for solutioni
.- Parameters
solution_batch (array-like) – (batch_size,
solution_dim
) array of solution parameters.objective_batch (array-like) – (batch_size,) array with objective function evaluations of the solutions.
measures_batch (array-like) – (batch_size,
measure_dim
) array with measure space coordinates of all the solutions.metadata_batch (array-like) –
(batch_size,) array of Python objects representing metadata for the solution. For instance, this could be a dict with several properties.
Warning
Due to how NumPy’s
asarray()
automatically converts array-like objects to arrays, passing array-like objects as metadata may lead to unexpected behavior. However, the metadata may be a dict or other object which contains arrays, i.e.metadata_batch
could be an array of dicts which contain arrays.
- Returns
2-element tuple of (status_batch, value_batch) which describes the results of the additions. These outputs are particularly useful for algorithms such as CMA-ME.
status_batch (
numpy.ndarray
ofint
): An array of integers which represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:0
: The solution was not added to the archive.1
: The solution improved the objective value of a cell which was already in the archive.2
: The solution discovered a new cell in the archive.
All statuses (and values, below) are computed with respect to the current archive. For example, if two solutions both introduce the same new archive cell, then both will be marked with
2
.The alternative is to depend on the order of the solutions in the batch – for example, if we have two solutions
a
andb
which introduce the same new cell in the archive,a
could be inserted first with status2
, andb
could be inserted second with status1
because it improves upona
. However, our implementation does not do this.To convert statuses to a more semantic format, cast all statuses to
AddStatus
e.g. with[AddStatus(s) for s in status_batch]
.value_batch (
dtype
): An array with values for each solution in the batch. With the default values oflearning_rate = 1.0
andthreshold_min = -np.inf
, the meaning of each value depends on the correspondingstatus
and is identical to that in CMA-ME (Fontaine 2020):0
(not added): The value is the “negative improvement,” i.e. the objective of the solution passed in minus the objective of the elite still in the archive (this value is negative because the solution did not have a high enough objective to be added to the archive).1
(improve existing cell): The value is the “improvement,” i.e. the objective of the solution passed in minus the objective of the elite previously in the archive.2
(new cell): The value is just the objective of the solution.
In contrast, for other values of
learning_rate
andthreshold_min
, each value is equivalent to the objective value of the solution minus the threshold of its corresponding cell in the archive.
- Return type
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective_batch
ormeasures_batch
has non-finite values (inf or NaN).
- add_single(solution, objective, measures, metadata=None)[source]¶
Inserts a single solution into the archive.
The solution is only inserted if it has a higher
objective
than the threshold of the corresponding cell. For the default values oflearning_rate
andthreshold_min
, this threshold is simply the objective value of the elite previously in the cell. The threshold is also updated if the solution was inserted.Note
To make it more amenable to modifications, this method’s implementation is designed to be readable at the cost of performance, e.g., none of its operations are modified. If you need performance, we recommend using
add()
.- Parameters
solution (array-like) – Parameters of the solution.
objective (float) – Objective function evaluation of the solution.
measures (array-like) – Coordinates in measure space of the solution.
metadata (object) –
Python object representing metadata for the solution. For instance, this could be a dict with several properties.
Warning
Due to how NumPy’s
asarray()
automatically converts array-like objects to arrays, passing array-like objects as metadata may lead to unexpected behavior. However, the metadata may be a dict or other object which contains arrays.
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective
is non-finite (inf or NaN) ormeasures
has non-finite values.
- Returns
2-element tuple of (status, value) describing the result of the add operation. Refer to
add()
for the meaning of the status and value.- Return type
- as_pandas(include_solutions=True, include_metadata=False)[source]¶
Converts the archive into an
ArchiveDataFrame
(a child class ofpandas.DataFrame
).The implementation of this method in
ArchiveBase
creates a dataframe consisting of:1 column of integers (
np.int32
) for the index, namedindex
. Seeindex_of()
for more info.measure_dim
columns for the measures, namedmeasure_0, measure_1, ...
1 column for the objectives, named
objective
solution_dim
columns for the solution parameters, namedsolution_0, solution_1, ...
1 column for the metadata objects, named
metadata
In short, the dataframe looks like this:
index
measure_0
…
objective
solution_0
…
metadata
…
…
Compared to
pandas.DataFrame
, theArchiveDataFrame
adds methods and attributes which make it easier to manipulate archive data. For more information, refer to theArchiveDataFrame
documentation.- Parameters
- Returns
See above.
- Return type
- clear()[source]¶
Removes all elites from the archive.
After this method is called, the archive will be
empty
.
- cqd_score(iterations, target_points, penalties, obj_min, obj_max, dist_max=None, dist_ord=None)[source]¶
Computes the CQD score of the archive.
The Continuous Quality Diversity (CQD) score was introduced in Kent 2022.
Note
This method by default assumes that the archive has an
upper_bounds
andlower_bounds
property which delineate the bounds of the measure space, as is the case inGridArchive
,CVTArchive
, andSlidingBoundariesArchive
. If this is not the case,dist_max
must be passed in, andtarget_points
must be an array of custom points.- Parameters
iterations (int) – Number of times to compute the CQD score. We return the mean CQD score across these iterations.
target_points (int or array-like) – Number of target points to generate, or an (iterations, n, measure_dim) array which lists n target points to list on each iteration. When an int is passed, the points are sampled uniformly within the bounds of the measure space.
penalties (int or array-like) – Number of penalty values over which to compute the score (the values are distributed evenly over the range [0,1]). Alternatively, this may be a 1D array which explicitly lists the penalty values. Known as \(\theta\) in Kent 2022.
obj_min (float) – Minimum objective value, used when normalizing the objectives.
obj_max (float) – Maximum objective value, used when normalizing the objectives.
dist_max (float) – Maximum distance between points in measure space. Defaults to the distance between the extremes of the measure space bounds (the type of distance is computed with the order specified by
dist_ord
). Known as \(\delta_{max}\) in Kent 2022.dist_ord – Order of the norm to use for calculating measure space distance; this is passed to
numpy.linalg.norm()
as theord
argument. Seenumpy.linalg.norm()
for possible values. The default is to use Euclidean distance (L2 norm).
- Returns
The mean CQD score obtained with
iterations
rounds of calculations.- Raises
RuntimeError – The archive does not have the bounds properties mentioned above, and dist_max is not specified or the target points are not provided.
ValueError – target_points or penalties is an array with the wrong shape.
- abstract index_of(measures_batch)[source]¶
Returns archive indices for the given batch of measures.
If you need to retrieve the index of the measures for a single solution, consider using
index_of_single()
.- Parameters
measures_batch (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.- Returns
(batch_size,) array with the indices of the batch of measures in the archive’s storage arrays.
- Return type
- index_of_single(measures)[source]¶
Returns the index of the measures for one solution.
While
index_of()
takes in a batch of measures, this method takes in the measures for only one solution. Ifindex_of()
is implemented correctly, this method should work immediately (i.e. “out of the box”).- Parameters
measures (array-like) – (
measure_dim
,) array of measures for a single solution.- Returns
Integer index of the measures in the archive’s storage arrays.
- Return type
int or numpy.integer
- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- retrieve(measures_batch)[source]¶
Retrieves the elites with measures in the same cells as the measures specified.
This method operates in batch, i.e. it takes in a batch of measures and outputs an
EliteBatch
. SinceEliteBatch
is a namedtuple, it can be unpacked:solution_batch, objective_batch, measures_batch, \ index_batch, metadata_batch = archive.retrieve(...)
Or the fields may be accessed by name:
elite_batch = archive.retrieve(...) elite_batch.solution_batch elite_batch.objective_batch elite_batch.measures_batch elite_batch.index_batch elite_batch.metadata_batch
If the cell associated with
measures_batch[i]
has an elite in it, thenelite_batch.solution_batch[i]
,elite_batch.objective_batch[i]
,elite_batch.measures_batch[i]
,elite_batch.index_batch[i]
, andelite_batch.metadata_batch[i]
will be set to the properties of the elite. Note thatelite_batch.measures_batch[i]
may not be equal tomeasures_batch[i]
since the measures only need to be in the same archive cell.If the cell associated with
measures_batch[i]
does not have any elite in it, then the corresponding outputs are set to empty values – namely:elite_batch.solution_batch[i]
will be an array of NaNelite_batch.objective_batch[i]
will be NaNelite_batch.measures_batch[i]
will be an array of NaNelite_batch.index_batch[i]
will be -1elite_batch.metadata_batch[i]
will be None
If you need to retrieve a single elite associated with some measures, consider using
retrieve_single()
.- Parameters
measures_batch (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.- Returns
See above.
- Return type
- Raises
ValueError –
measures_batch
is not of shape (batch_size,measure_dim
).ValueError –
measures_batch
has non-finite values (inf or NaN).
- retrieve_single(measures)[source]¶
Retrieves the elite with measures in the same cell as the measures specified.
While
retrieve()
takes in a batch of measures, this method takes in the measures for only one solution and returns a singleElite
.- Parameters
measures (array-like) – (
measure_dim
,) array of measures.- Returns
If there is an elite with measures in the same cell as the measures specified, then this method returns an
Elite
where all the fields hold the info of that elite. Otherwise, this method returns anElite
filled with the same “empty” values described inretrieve()
.- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- sample_elites(n)[source]¶
Randomly samples elites from the archive.
Currently, this sampling is done uniformly at random. Furthermore, each sample is done independently, so elites may be repeated in the sample. Additional sampling methods may be supported in the future.
Since
EliteBatch
is a namedtuple, the result can be unpacked (here we show how to ignore some of the fields):solution_batch, objective_batch, measures_batch, *_ = \ archive.sample_elites(32)
Or the fields may be accessed by name:
elite = archive.sample_elites(16) elite.solution_batch elite.objective_batch ...
- Parameters
n (int) – Number of elites to sample.
- Returns
A batch of elites randomly selected from the archive.
- Return type
- Raises
IndexError – The archive is empty.
- property best_elite¶
The elite with the highest objective in the archive.
None if there are no elites in the archive.
Note
If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.
- Type
- property dtype¶
The dtype of the solutions, objective, and measures.
- Type
data-type
- property qd_score_offset¶
The offset which is subtracted from objective values when computing the QD score.
- Type
- property stats¶
Statistics about the archive.
See
ArchiveStats
for more info.- Type