ribs.archives.k_means_centroids

ribs.archives.k_means_centroids(*, centroids: Int, ranges: Collection[tuple[Float, Float]], samples: Int | ArrayLike = 100000, dtype: DTypeLike = numpy.float64, seed: Int | None = None, k_means_kwargs: dict | None = None) tuple[np.ndarray, np.ndarray][source]

Generates archive centroids with k-means clustering.

Based on Vassiliades 2018, this function approximately generates a Centroidal Voronoi Tessellation (CVT) with uniformly-sized cells. This is accomplished by sampling samples points uniformly across the measure space range determined by ranges, and then clustering the points into centroids clusters using k-means clustering. The set of cluster centroids output by k-means is used for the CVT.

Parameters:
centroids: Int

Number of centroids to create during clustering.

ranges: Collection[tuple[Float, Float]]

Upper and lower bound of each dimension of the measure space, e.g., [(-1, 1), (-2, 2)] indicates the first dimension should have bounds \([-1,1]\) (inclusive), and the second dimension should have bounds \([-2,2]\) (inclusive). ranges should be the same length as dims.

samples: Int | ArrayLike = 100000

If it is an int, this specifies the number of samples to generate before clustering them to create the CVT. These points will be sampled uniformly within the ranges specified above. Alternatively, this argument can be a (num_samples, measure_dim) array of measure space points to cluster. It can be useful to pass in custom samples when there are restrictions on what samples in the measure space are (physically) possible.

dtype: DTypeLike = numpy.float64

Data type of the centroids and samples.

seed: Int | None = None

Value to seed the random number generator and sklearn.cluster.k_means(). Pass None to avoid a fixed seed.

k_means_kwargs: dict | None = None

Keyword arguments for sklearn.cluster.k_means(). By default, we pass n_init=1, init=”random”, algorithm=”lloyd”, and random_state=seed. Note that these settings are geared towards quickly generating centroids that are “good enough.” To create centroids that are more uniformly distributed, it may be better to use settings like init=”k-means++”, though such settings will require more time to run.

Returns:

Two arrays. The first is a (centroids, measure_dim) array of centroids. The second is a (samples, measure_dim) array of samples that were clustered to create the centroids.

Raises:
  • ValueErrorsamples was passed in as an array, and the array has the wrong shape.

  • RuntimeError – The number of centroids found during k-means clustering is not equal to the number of centroids passed in.