ribs.archives.ArrayStore¶
- class ribs.archives.ArrayStore(field_desc, capacity)[source]¶
Maintains a set of arrays that share a common dimension.
The ArrayStore consists of several fields of data that are manipulated simultaneously via batch operations. Each field is a NumPy array with a dimension of
(capacity, ...)
and can be of any type.Since the arrays all share a common first dimension, they also share a common index. For instance, if we
retrieve()
the data at indices[0, 2, 1]
, we would get a dict that contains the objective and measures at indices 0, 2, and 1, e.g.:{ "objective": [-1, 3, -5], "measures": [[0, 0], [2, 1], [3, 5]], }
The ArrayStore supports several further operations, in particular a flexible
add()
method that inserts data into the ArrayStore.- Parameters
field_desc (dict) – Description of fields in the array store. The description is a dict mapping from a str to a tuple of
(shape, dtype)
. For instance,{"objective": ((), np.float32), "measures": ((10,), np.float32)}
will create an “objective” field with shape(capacity,)
and a “measures” field with shape(capacity, 10)
. Note that field names must be valid Python identifiers.capacity (int) – Total possible entries in the store.
- Variables
_props (dict) –
Properties that are common to every ArrayStore.
”capacity”: Maximum number of data entries in the store.
”occupied”: Boolean array of size
(capacity,)
indicating whether each index has data associated with it.”n_occupied”: Number of data entries currently in the store.
”occupied_list”: Array of size
(capacity,)
listing all occupied indices in the store. Only the firstn_occupied
elements will be valid.”updates”: Int array recording number of calls to functions that modified the store.
_fields (dict) – Holds all the arrays with their data.
- Raises
ValueError – One of the fields in
field_desc
has a reserved name (currently, “index” is the only reserved name).ValueError – One of the fields in
field_desc
has a name that is not a valid Python identifier.
Methods
__iter__
()Iterates over entries in the store.
__len__
()Number of occupied indices in the store, i.e., number of indices that have a corresponding data entry.
add
(indices, new_data, extra_args, transforms)Adds new data to the store at the given indices.
Returns the raw data in the ArrayStore as a one-level dictionary.
clear
()Removes all entries from the store.
data
([fields, return_type])Retrieves data for all entries in the store.
Loads an ArrayStore from a dict of raw info.
resize
(capacity)Resizes the store to the given capacity.
retrieve
(indices[, fields, return_type])Collects data at the given indices.
Attributes
Maximum number of data entries in the store.
Data types of fields in the store.
Description of fields in the store.
List of fields in the store.
Boolean array of size
(capacity,)
indicating whether each index has a data entry.int32 array listing all occupied indices in the store.
- __iter__()[source]¶
Iterates over entries in the store.
When iterated over, this iterator yields dicts mapping from the fields to the individual entries. For instance, if we had an “objective” field, one entry might look like
{"index": 1, "objective": 6.0}
(similar toretrieve()
, the index is included in the output).Example
for entry in store: entry["index"] entry["objective"] ...
- __len__()[source]¶
Number of occupied indices in the store, i.e., number of indices that have a corresponding data entry.
- add(indices, new_data, extra_args, transforms)[source]¶
Adds new data to the store at the given indices.
The indices, new_data, and add_info are passed through transforms before adding to the store. The general idea is that these transforms will gradually modify the indices, new_data, and add_info. For instance, they can add new fields to new_data (new_data may not initially have all the same fields as the store). Alternatively, they can filter out duplicate indices, eg if multiple entries are being inserted at the same index we can choose one with the best objective. As another example, the transforms can add stats to the add_info or delete fields from the add_info.
The signature of a transform is as follows:
def transform(indices, new_data, add_info, extra_args, occupied, cur_data) -> (indices, new_data, add_info):
Transform parameters:
indices (array-like): Array of indices at which new_data should be inserted.
new_data (dict): New data for the given indices. Maps from field name to the array of new data for that field.
add_info (dict): Information to return to the user about the addition process. Example info includes whether each entry was ultimately inserted into the store, as well as general statistics. For the first transform, this will be an empty dict.
extra_args (dict): Additional arguments for the transform.
occupied (array-like): Whether the given indices are currently occupied. Same as that given by
retrieve()
.cur_data (dict): Data at the current indices in the store. Same as that given by
retrieve()
.
Transform outputs:
indices (array-like): Modified indices. We do NOT assume that the final indices will be unique.
new_data (dict): Modified new_data. At the end of the transforms, it should have the same keys as the store. If
indices
is empty,new_data
will be ignored.add_info (dict): Modified add_info.
- Parameters
indices (array-like) – Initial list of indices for addition.
new_data (dict) – Initial data for addition.
extra_args (dict) – Dict containing additional arguments to pass to the transforms. The dict is passed directly (i.e., no unpacking like with kwargs).
transforms (list) – List of transforms on the data to be added.
- Returns
Final
add_info
from the transforms.new_data
andindices
are not returned; rather, thenew_data
is added into the store atindices
.- Return type
- Raises
ValueError – The final version of
new_data
does not have the same keys as the fields of this store.ValueError – The final version of
new_data
has fields that have a different length thanindices
.
- as_raw_dict()[source]¶
Returns the raw data in the ArrayStore as a one-level dictionary.
To collapse the dict, we prefix each key with
props.
orfields.
, so the result looks as follows:{ "props.capacity": ..., "props.occupied": ..., ... "fields.objective": ..., ... }
- Returns
See description above.
- Return type
- data(fields=None, return_type='dict')[source]¶
Retrieves data for all entries in the store.
Equivalent to calling
retrieve()
withoccupied_list
.- Parameters
fields (str or array-like of str) – See
retrieve()
.return_type (str) – See
retrieve()
.
- Returns
See
data
inretrieve()
.occupied
is not returned since all indices are known to be occupied in this method.
- static from_raw_dict(d)[source]¶
Loads an ArrayStore from a dict of raw info.
- Parameters
d (dict) – Dict returned by
as_raw_dict()
.- Returns
The new ArrayStore created from d.
- Return type
- Raises
ValueError – The loaded props dict has the wrong keys.
- resize(capacity)[source]¶
Resizes the store to the given capacity.
- Parameters
capacity (int) – New capacity.
- Raises
ValueError – The new capacity is less than or equal to the current capacity.
- retrieve(indices, fields=None, return_type='dict')[source]¶
Collects data at the given indices.
- Parameters
indices (array-like) – List of indices at which to collect data.
fields (str or array-like of str) – List of fields to include. By default, all fields will be included, with an additional “index” as the last field (“index” can also be placed anywhere in this list). This can also be a single str indicating a field name.
return_type (str) – Type of data to return. See the
data
returned below. Ignored iffields
is a str.
- Returns
2-element tuple consisting of:
occupied: Array indicating which indices, among those passed in, have an associated data entry. For instance, if
indices
is[0, 1, 2]
and only index 2 has data, thenoccupied
will be[False, False, True]
.Note that if a given index is not marked as occupied, it can have any data value associated with it. For instance, if index 1 was not occupied, then the 6.0 returned in the
dict
example below should be ignored.data: The data at the given indices. If
fields
was a single str, this will just be an array holding data for the given field. Otherwise, this data can take the following forms, depending on thereturn_type
argument:return_type="dict"
: Dict mapping from the field name to the field data at the given indices. For instance, if we have anobjective
field and request data at indices[4, 1, 0]
, we would getdata
that looks like{"objective": [1.5, 6.0, 2.3], "index": [4, 1, 0]}
. Observe that we also return the indices as anindex
entry in the dict. The keys in this dict can be modified using thefields
arg; duplicate keys will be ignored since the dict stores unique keys.return_type="tuple"
: Tuple of arrays matching the order given infields
. For instance, iffields
was["objective", "measures"]
, we would receive a tuple of(objective_arr, measures_arr)
. In this case, the results fromretrieve
could be unpacked as:occupied, (objective, measures) = store.retrieve( ..., return_type="tuple", )
Unlike with the
dict
return type, duplicate fields will show up as duplicate entries in the tuple, e.g.,fields=["objective", "objective"]
will result in two objective arrays being returned.By default, (i.e., when
fields=None
), the fields in the tuple will be ordered according to thefield_desc
argument in the constructor, along withindex
as the last field.return_type="pandas"
: Apandas.DataFrame
with the following columns (by default):For fields that are scalars, a single column with the field name. For example,
objective
would have a single column calledobjective
.For fields that are 1D arrays, multiple columns with the name suffixed by its index. For instance, if we have a
measures
field of length 10, we create 10 columns with namesmeasures_0
,measures_1
, …,measures_9
. We do not currently support fields with >1D data.1 column of integers (
np.int32
) for the index, namedindex
.
In short, the dataframe might look like this:
objective
measures_0
…
index
…
Like the other return types, the columns can be adjusted with the
fields
parameter.
All data returned by this method will be a copy, i.e., the data will not update as the store changes.
- Return type
- Raises
ValueError – Invalid field name provided.
ValueError – Invalid return_type provided.
- property dtypes¶
Data types of fields in the store.
Example
store.dtypes == { "objective": np.float32, "measures": np.float32, }
- Type
- property field_desc¶
Description of fields in the store.
Example
store.field_desc == { "objective": ((), np.float32), "measures": ((10,), np.float32), }
See the constructor
field_desc
parameter for more info. Unlike in the field_desc in the constructor, which accepts ints for 1D field shapes (e.g.,5
), this field_desc shows 1D field shapes as tuples of 1 entry (e.g.,(5,)
). Since dicts in Python are ordered, note that this dict will have the same order as in the constructor.- Type
- property field_list¶
List of fields in the store.
Example
store.field_list == ["objective", "measures"]
- Type
- property occupied¶
Boolean array of size
(capacity,)
indicating whether each index has a data entry.- Type
- property occupied_list¶
int32 array listing all occupied indices in the store.
- Type