ribs.archives.ProximityArchive¶
- class ribs.archives.ProximityArchive(*, solution_dim, measure_dim, k_neighbors, novelty_threshold, local_competition=False, initial_capacity=128, qd_score_offset=0.0, seed=None, dtype=<class 'numpy.float64'>, extra_fields=None, ckdtree_kwargs=None)[source]¶
An archive that adds new solutions based on novelty, where novelty is defined via proximity to other solutions in measure space.
This archive originates in Novelty Search and is described in Lehman 2011. Solutions are added to the archive if their novelty exceeds a certain threshold. Novelty \(\rho\) is defined as the average (Euclidean) distance in measure space to the \(k\)-nearest neighbors of the solution in the archive:
\[\rho(x) = \frac{1}{k}\sum_{i=1}^{k}\text{dist}(x, \mu_i)\]Where \(x\) is the measure value of some solution, and \(\mu_{1..k}\) are the measure values of the \(k\)-nearest neighbors in measure space.
This archive also supports the local competition behavior from Novelty Search with Local Competition, described in Lehman 2011b.
Note
When used for diversity optimization, this archive does not require any objectives, and
objective=None
can be passed intoadd()
. For consistency with the rest of pyribs,objective=None
will result in a default objective value of 0, which will also cause stats like QD score and best objective to be 0. Alternatively, it is possible to associate objectives with the solutions by passingobjective
toadd()
just like in other archives.Note
The other statistics will also behave slightly differently from other archives:
If this archive has any solutions in it, the coverage (
archive.stats.coverage
) will always be reported as 1. This is because the archive is unbounded, so there is no predefined number of cells to fill. We suggest usingarchive.stats.num_elites
instead for a more meaningful coverage metric.Since the number of cells in the archive is equivalent to the number of elites in the archive, the normalized QD score (
archive.stats.norm_qd_score
) will always equal the mean objective (archive.stats.obj_mean
).
- Parameters
solution_dim (int) – Dimension of the solution space.
measure_dim (int) – Dimension of the measure space.
k_neighbors (int) – The maximum number of nearest neighbors to use for computing novelty (maximum here is indicated for cases when there are fewer than
k_neighbors
solutions in the archive).novelty_threshold (float) – The level of novelty required to add a solution to the archive.
local_competition (bool) – Whether to turn on local competition behavior. If turned on, the archive will require objectives to be passed in during
add()
. Furthermore, theadd_info
returned byadd()
will include local competition information. Finally, solutions can be replaced in the archive. Specifically, if a candidate solution’s novelty is below the novelty threshold, its objective will be compared to that of its nearest neighbor. If the candidate’s objective is higher, it will replace the nearest neighbor.initial_capacity (int) – Since this archive is unstructured, it does not have a fixed size, and it will grow as solutions are added. In the implementation, we store solutions in fixed-size arrays, and every time the capacity of these arrays is reached, we double their sizes (similar to the vector in C++). This parameter determines the initial capacity of the archive’s arrays. It may be useful when it is known in advance how large the archive will grow.
qd_score_offset (float) – Archives often contain negative objective values, and if the QD score were to be computed with these negative objectives, the algorithm would be penalized for adding new cells with negative objectives. Thus, a standard practice is to normalize all the objectives so that they are non-negative by introducing an offset. This QD score offset will be subtracted from all objectives in the archive, e.g., if your objectives go as low as -300, pass in -300 so that each objective will be transformed as
objective - (-300)
.seed (int) – Value to seed the random number generator. Set to None to avoid a fixed seed.
dtype (str or data-type or dict) – Data type of the solutions, objectives, and measures. We only support
"f"
/np.float32
and"d"
/np.float64
. Alternatively, this can be a dict specifying separate dtypes, of the form{"solution": <dtype>, "objective": <dtype>, "measures": <dtype>}
.extra_fields (dict) – Description of extra fields of data that is stored next to elite data like solutions and objectives. The description is a dict mapping from a field name (str) to a tuple of
(shape, dtype)
. For instance,{"foo": ((), np.float32), "bar": ((10,), np.float32)}
will create a “foo” field that contains scalar values and a “bar” field that contains 10D values. Note that field names must be valid Python identifiers, and names already used in the archive are not allowed.ckdtree_kwargs (dict) – When computing nearest neighbors, we construct a
cKDTree
. This parameter will pass additional kwargs when constructing the tree. By default, we do not pass in any kwargs.
- Raises
ValueError –
initial_capacity
must be at least 1.
Methods
__iter__
()Creates an iterator over the elites in the archive.
__len__
()Number of elites in the archive.
add
(solution, objective, measures, **fields)Inserts a batch of solutions into the archive.
add_single
(solution, objective, measures, ...)Inserts a single solution into the archive.
as_pandas
([include_solutions, include_metadata])DEPRECATED.
clear
()Removes all elites from the archive.
compute_novelty
(measures[, local_competition])Computes the novelty and local competition of the given measures.
cqd_score
(iterations, target_points, ...[, ...])Computes the CQD score of the archive.
data
([fields, return_type])Retrieves data for all elites in the archive.
index_of
(measures)Returns the index of the closest solution to the given measures.
index_of_single
(measures)Returns the index of the measures for one solution.
retrieve
(measures)Retrieves the elites with measures in the same cells as the measures specified.
retrieve_single
(measures)Retrieves the elite with measures in the same cell as the measures specified.
Randomly samples elites from the archive.
Attributes
The elite with the highest objective in the archive.
The number of solutions that can currently be stored in this archive.
Total number of cells in the archive.
DEPRECATED.
Mapping from field name to dtype for all fields in the archive.
Whether the archive is empty.
List of data fields in the archive.
The number of nearest neighbors to use for determining novelty.
The learning rate for threshold updates.
Whether local competition behavior is turned on.
The lower bounds of the measures in the archive.
Dimensionality of the measure space.
The degree of novelty required add a solution to the archive.
The offset which is subtracted from objective values when computing the QD score.
Dimensionality of the solutions in the archive.
Statistics about the archive.
The initial threshold value for all the cells.
The upper bounds of the measures in the archive.
- __iter__()¶
Creates an iterator over the elites in the archive.
Example
for elite in archive: elite["solution"] elite["objective"] ...
- __len__()¶
Number of elites in the archive.
- add(solution, objective, measures, **fields)[source]¶
Inserts a batch of solutions into the archive.
Solutions are inserted if they have a high enough novelty score as discussed in the documentation for this class. The novelty is determined by comparing to solutions currently in the archive.
If
local_competition
is turned on, solutions can also replace existing solutions in the archive. Namely, if the solution was not novel enough to be added, it will be compared to its nearest neighbor, and if it exceeds the objective value of its nearest neighbor, it will replace the nearest neighbor. If there are conflicts where multiple solutions may replace a single solution, the highest-performing is chosen.Note
The indices of all arguments should “correspond” to each other, i.e.
solution[i]
,objective[i]
,measures[i]
, and should be the solution parameters, objective, and measures for solutioni
.- Parameters
solution (array-like) – (batch_size,
solution_dim
) array of solution parameters.objective (None or array-like) – A value of None will cause the objective values to default to 0. However, if the user wishes to associate an objective with each solution, this can be a (batch_size,) array with objective function evaluations of the solutions. If
local_competition
is turned on, this argument must be provided.measures (array-like) – (batch_size,
measure_dim
) array with measure space coordinates of all the solutions.fields (keyword arguments) – Additional data for each solution. Each argument should be an array with batch_size as the first dimension.
- Returns
Information describing the result of the add operation. The dict contains the following keys:
"status"
(numpy.ndarray
ofint
): An array of integers that represent the “status” obtained when attempting to insert each solution in the batch. Each item has the following possible values:0
: The solution was not added to the archive.1
: The solution replaced an existing solution in the archive due to having a higher objective (only applies iflocal_competition
is turned on).2
: The solution was added to the archive due to being sufficiently novel.
To convert statuses to a more semantic format, cast all statuses to
AddStatus
e.g. with[AddStatus(s) for s in add_info["status"]]
."novelty"
(numpy.ndarray
ofdtypes
[“measures”]): The computed novelty of the solutions passed in. If there were no solutions to compute novelty with respect to (i.e., the archive was empty), the novelty is set to thenovelty_threshold
."local_competition"
(numpy.ndarray
ofint
): Only available iflocal_competition
is turned on. Indicates, for each solution, how many of the nearest neighbors had lower objective values. Maximum value isk_neighbors
. If there were no solutions to compute novelty with respect to, (i.e., the archive was empty), the local competition is set to 0."value"
(numpy.ndarray
ofdtypes
[“objective”]): Only available iflocal_competition
is turned on. The meaning of each value depends on the correspondingstatus
and is inspired by the values in CMA-ME (Fontaine 2020):0
(not added): The value is the “negative improvement,” i.e. the objective of the solution passed in minus the objective of the nearest neighbor (this value is negative because the solution did not have a high enough objective to be added to the archive).1
(replace/improve existing solution): The value is the “improvement,” i.e. the objective of the solution passed in minus the objective of the elite that was replaced.2
(new solution): The value is just the objective of the solution.
- Return type
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective
ormeasures
has non-finite values (inf or NaN).ValueError – ``local_competition` is turned on but objective was not passed in.
- add_single(solution, objective, measures, **fields)[source]¶
Inserts a single solution into the archive.
- Parameters
solution (array-like) – Parameters of the solution.
objective (None or float) – Set to None to get the default value of 0; otherwise, a valid objective value is also acceptable.
measures (array-like) – Coordinates in measure space of the solution.
fields (keyword arguments) – Additional data for the solution.
- Returns
Information describing the result of the add operation. The dict contains
status
andnovelty
keys; refer toadd()
for the meaning of status and novelty.- Return type
- Raises
ValueError – The array arguments do not match their specified shapes.
ValueError –
objective
is non-finite (inf or NaN) ormeasures
has non-finite values.ValueError – ``local_competition` is turned on but objective was not passed in.
- as_pandas(include_solutions=True, include_metadata=False)¶
DEPRECATED.
- clear()¶
Removes all elites from the archive.
After this method is called, the archive will be
empty
.
- compute_novelty(measures, local_competition=None)[source]¶
Computes the novelty and local competition of the given measures.
- Parameters
measures (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.local_competition (None or array-like) – This can be None to indicate not to compute local competition. Otherwise, it can be a (batch_size,) array of objective values to use as references for computing objective values.
- Returns
Either one value or a tuple of two values:
numpy.ndarray: (batch_size,) array holding the novelty score of each measure. If the archive is empty, the novelty is set to the
novelty_threshold
.numpy.ndarray: If
local_competition
is passed in, a (batch_size,) array holding the local competition of each solution will also be returned. If the archive is empty, the local competition will be set to 0.
- Return type
- cqd_score(iterations, target_points, penalties, obj_min, obj_max, dist_max=None, dist_ord=None)[source]¶
Computes the CQD score of the archive.
Refer to the documentation in
ArchiveBase.cqd_score()
for more info. The key difference from the base implementation is that the implementation in ArchiveBase assumes the archive has a pre-defined measure space with lower and upper bounds. However, by nature of being unstructured, this archive has lower and upper bounds that change over time. Thus, it is required to directly pass intarget_points
anddist_max
.- Raises
ValueError – dist_max and target_points were not passed in.
- data(fields=None, return_type='dict')¶
Retrieves data for all elites in the archive.
- Parameters
fields (str or array-like of str) – List of fields to include. By default, all fields will be included, with an additional “index” as the last field (“index” can also be placed anywhere in this list). This can also be a single str indicating a field name.
return_type (str) – Type of data to return. See below. Ignored if
fields
is a str.
- Returns
The data for all entries in the archive. If
fields
was a single str, this will just be an array holding data for the given field. Otherwise, this data can take the following forms, depending on thereturn_type
argument:return_type="dict"
: Dict mapping from the field name to the field data at the given indices. An example is:{ "solution": [[1.0, 1.0, ...], ...], "objective": [1.5, ...], "measures": [[1.0, 2.0], ...], "threshold": [0.8, ...], "index": [4, ...], }
Observe that we also return the indices as an
index
entry in the dict. The keys in this dict can be modified with thefields
arg; duplicate fields will be ignored since the dict stores unique keys.return_type="tuple"
: Tuple of arrays matching the field order given infields
. For instance, iffields
was["objective", "measures"]
, we would receive a tuple of(objective_arr, measures_arr)
. In this case, the results fromretrieve
could be unpacked as:objective, measures = archive.data(["objective", "measures"], return_type="tuple")
Unlike with the
dict
return type, duplicate fields will show up as duplicate entries in the tuple, e.g.,fields=["objective", "objective"]
will result in two objective arrays being returned.By default, (i.e., when
fields=None
), the fields in the tuple will be ordered according to thefield_list
along withindex
as the last field.return_type="pandas"
: AArchiveDataFrame
with the following columns:For fields that are scalars, a single column with the field name. For example,
objective
would have a single column calledobjective
.For fields that are 1D arrays, multiple columns with the name suffixed by its index. For instance, if we have a
measures
field of length 10, we create 10 columns with namesmeasures_0
,measures_1
, …,measures_9
. We do not currently support fields with >1D data.1 column of integers (
np.int32
) for the index, namedindex
.
In short, the dataframe might look like this by default:
solution_0
…
objective
measures_0
…
threshold
index
…
…
Like the other return types, the columns can be adjusted with the
fields
parameter.
All data returned by this method will be a copy, i.e., the data will not update as the archive changes.
- index_of(measures)[source]¶
Returns the index of the closest solution to the given measures.
Unlike the structured archives like
GridArchive
, this archive does not have indexed cells where each measure “belongs.” Thus, this method instead returns the index of the solution with the closest measure to each solution passed in.This means that
retrieve()
will return the solution with the closest measure to each measure passed into that method.- Parameters
measures (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.- Returns
- (batch_size,) array of integer indices representing
the location of the solution in the archive.
- Return type
- Raises
RuntimeError – There were no entries in the archive.
ValueError –
measures
is not of shape (batch_size,measure_dim
).ValueError –
measures
has non-finite values (inf or NaN).
- index_of_single(measures)¶
Returns the index of the measures for one solution.
While
index_of()
takes in a batch of measures, this method takes in the measures for only one solution. Ifindex_of()
is implemented correctly, this method should work immediately (i.e. “out of the box”).- Parameters
measures (array-like) – (
measure_dim
,) array of measures for a single solution.- Returns
Integer index of the measures in the archive’s storage arrays.
- Return type
int or numpy.integer
- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- retrieve(measures)¶
Retrieves the elites with measures in the same cells as the measures specified.
This method operates in batch, i.e., it takes in a batch of measures and outputs the batched data for the elites:
occupied, elites = archive.retrieve(...) elites["solution"] # Shape: (batch_size, solution_dim) elites["objective"] elites["measures"] elites["threshold"] elites["index"]
If the cell associated with
elites["measures"][i]
has an elite in it, thenoccupied[i]
will be True. Furthermore,elites["solution"][i]
,elites["objective"][i]
,elites["measures"][i]
,elites["threshold"][i]
, andelites["index"][i]
will be set to the properties of the elite. Note thatelites["measures"][i]
may not be equal to themeasures[i]
passed as an argument, since the measures only need to be in the same archive cell.If the cell associated with
measures[i]
does not have any elite in it, thenoccupied[i]
will be set to False. Furthermore, the corresponding outputs will be set to empty values – namely:NaN for floating-point fields
-1 for the “index” field
0 for integer fields
None for object fields
If you need to retrieve a single elite associated with some measures, consider using
retrieve_single()
.- Parameters
measures (array-like) – (batch_size,
measure_dim
) array of coordinates in measure space.- Returns
2-element tuple of (occupied array, dict). The occupied array indicates whether each of the cells indicated by the coordinates in
measures
has an elite, while the dict contains the data of those elites. The dict maps from field name to the corresponding array.- Return type
- Raises
ValueError –
measures
is not of shape (batch_size,measure_dim
).ValueError –
measures
has non-finite values (inf or NaN).
- retrieve_single(measures)¶
Retrieves the elite with measures in the same cell as the measures specified.
While
retrieve()
takes in a batch of measures, this method takes in the measures for only one solution and returns a single bool and a dict with single entries.- Parameters
measures (array-like) – (
measure_dim
,) array of measures.- Returns
If there is an elite with measures in the same cell as the measures specified, then this method returns a True value and a dict where all the fields hold the info of the elite. Otherwise, this method returns a False value and a dict filled with the same “empty” values described in
retrieve()
.- Return type
- Raises
ValueError –
measures
is not of shape (measure_dim
,).ValueError –
measures
has non-finite values (inf or NaN).
- sample_elites(n)¶
Randomly samples elites from the archive.
Currently, this sampling is done uniformly at random. Furthermore, each sample is done independently, so elites may be repeated in the sample. Additional sampling methods may be supported in the future.
Example
elites = archive.sample_elites(16) elites["solution"] # Shape: (16, solution_dim) elites["objective"] ...
- Parameters
n (int) – Number of elites to sample.
- Returns
Holds a batch of elites randomly selected from the archive.
- Return type
- Raises
IndexError – The archive is empty.
- property best_elite¶
The elite with the highest objective in the archive.
None if there are no elites in the archive.
Note
If the archive is non-elitist (this occurs when using the archive with a learning rate which is not 1.0, as in CMA-MAE), then this best elite may no longer exist in the archive because it was replaced with an elite with a lower objective value. This can happen because in non-elitist archives, new solutions only need to exceed the threshold of the cell they are being inserted into, not the objective of the elite currently in the cell. See #314 for more info.
Note
The best elite will contain a “threshold” key. This threshold is the threshold of the best elite’s cell after the best elite was inserted into the archive.
- Type
- property capacity¶
The number of solutions that can currently be stored in this archive. The capacity doubles every time the archive fills up, similar to a C++ vector.
- Type
- property cells¶
Total number of cells in the archive. Since this archive is unstructured and grows over time, the number of cells is equal to the number of solutions currently in the archive.
- Type
- property dtype¶
DEPRECATED.
- property lower_bounds: numpy.ndarray¶
The lower bounds of the measures in the archive.
Since the archive can grow arbitrarily, this is calculated based on the minimum measure values of the solutions in the archive.
- Raises
RuntimeError – There are no solutions in the archive, so the
lower bounds do not exist. –
- property novelty_threshold¶
The degree of novelty required add a solution to the archive.
- Type
dtypes[“measures”]
- property qd_score_offset¶
The offset which is subtracted from objective values when computing the QD score.
- Type
- property stats¶
Statistics about the archive.
See
ArchiveStats
for more info.- Type
- property upper_bounds: numpy.ndarray¶
The upper bounds of the measures in the archive.
Since the archive can grow arbitrarily, this is calculated based on the maximum measure values of the solutions in the archive.
- Raises
RuntimeError – There are no solutions in the archive, so the
upper bounds do not exist. –