Learning a Repertoire of Robot Arm Configurations

In robotic manipulation, inverse kinematics involves figuring out how to configure the joints of an arm such that the end effector is at a certain position. For instance, in order to pick up a cup, a robot must move its gripper to the cup’s position, and in order to catch a ball, a robot must move its hand to where it predicts the ball will be.

In this tutorial, we will show how to use CMA-ME to find a repertoire of joint angles that move a robotic arm to a wide variety of positions. For simplicity, we will use a planar 12-DoF arm, i.e. an arm with 12 joints that only moves in 2D space.

This tutorial is based on the problem presented in Vassiliades 2018.


First, we install ribs.

%pip install ribs[all]

And here we import some utilities.

import time

import matplotlib.pyplot as plt
import numpy as np

Problem Description

Our arm consists of 12 joints and links of length 1. It looks something like the following, where the red dot is the base of the arm, the green dot is the position of the end effector, and each black dot is an intermediate joint (note that the base also counts as a joint). Since the total length of the arm is 12 units, the arm can only move to positions in a circle of radius 12 around its base.

Arm repertoire example

Each solution \(\theta\) consists of 12 joint angles for the arm, and each angle \(\theta_i\) is bounded by \([-\pi, \pi]\) (i.e. each joint has a full range of motion).

Since we want to move the arm to a wide variety of positions, our behavioral characteristics (BCs) are the final \((x,y)\) coordinates of the arm. To calculate these coordinates, we use these forward kinematics equations:

\[x = l_1 \cos(\theta_1) + l_2 \cos(\theta_1 + \theta_2) + ... + l_{12} \cos(\theta_1 + \theta_2 + ... \theta_{12})\]
\[y = l_1 \sin(\theta_1) + l_2 \sin(\theta_1 + \theta_2) + ... + l_{12} \sin(\theta_1 + \theta_2 + ... \theta_{12})\]

Where \(l_i\) is the length of the \(i\)-th link (in our case, \(l_i\) is always 1).

There are many possible objectives to use. For instance, in a real-world setting, we may choose to minimize the movement of certain joints, as they may be weaker or more prone to wear-and-tear. Here, we choose to maximize the negative standard deviation of the joint angles (i.e. minimize the standard deviation), such that the joint angles are as close to each other as possible. This will make the arm look like a smooth curve in its final position.

The following simulate function calculates the objectives and BCs for a batch of solutions.

def simulate(solutions, link_lengths):
    """Returns the objective values and BCs for a batch of solutions.
        solutions (np.ndarray): A (batch_size, dim) array where each row
            contains the joint angles for the arm. `dim` will always be 12
            in this tutorial.
        link_lengths (np.ndarray): A (dim,) array with the lengths of each
            arm link (this will always be an array of ones in the tutorial).
        objs (np.ndarray): (batch_size,) array with objective values.
        bcs (np.ndarray): (batch_size, 2) array with a BC in each row.
    n_dim = link_lengths.shape[0]
    objs = -np.std(solutions, axis=1)

    # theta_1, theta_1 + theta_2, ...
    cum_theta = np.cumsum(solutions, axis=1)
    # l_1 * cos(theta_1), l_2 * cos(theta_1 + theta_2), ...
    x_pos = link_lengths[None] * np.cos(cum_theta)
    # l_1 * sin(theta_1), l_2 * sin(theta_1 + theta_2), ...
    y_pos = link_lengths[None] * np.sin(cum_theta)

    bcs = np.concatenate(
            np.sum(x_pos, axis=1, keepdims=True),
            np.sum(y_pos, axis=1, keepdims=True),

    return objs, bcs

Quality Diversity Algorithm Setup

We will use CMA-ME, with the following pyribs components, to search for arm configurations:

  • CVTArchive: This archive uses a Centroidal Voronoi Tesselation (CVT) to divide the behavior space into evenly sized bins. It is typically used for high-dimensional behavior spaces where the curse of dimensionality prevents one from using GridArchive, but it works perfectly fine for lower dimensions too.

  • ImprovementEmitter: This emitter originates in Fontaine 2020. It uses CMA-ES to search for solutions that improve the archive.

  • Optimizer: Binds all the components together.

First, let’s create the archive.

from ribs.archives import CVTArchive

dof = 12  # Degrees of freedom for the arm.
link_lengths = np.ones(dof)  # 12 links, each with length 1.
max_pos = np.sum(link_lengths)
archive = CVTArchive(
    # Number of bins.
    # The x and y coordinates are bound by the maximum arm position.
    [(-max_pos, max_pos), (-max_pos, max_pos)],
    # The archive will use a k-D tree to search for the bin a solution
    # belongs to.

Now, the emitters. We will use 5 improvement emitters, each with batch size of 30. This means we will evaluate 150 solutions (5 x 30) on each iteration of the algorithm. As described earlier, we also bound each angle to be between \(-\pi\) and \(\pi\).

from ribs.emitters import ImprovementEmitter

emitters = [
        # Initial step size of 0.1 seems reasonable based on the bounds.
        bounds=[(-np.pi, np.pi)] * dof,
    ) for _ in range(5)

Finally, the optimizer combines everything together. This line may take a minute or two to run, as it initializes the archive, and CVTArchive uses k-means clustering (Lloyd’s algorithm) to generate the CVT.

from ribs.optimizers import Optimizer

opt = Optimizer(archive, emitters)


With our algorithm set up, we can now search for arm configurations. We also extract metrics throughout our training loop.

metrics = {
    "Archive Size": {
        "itrs": [0],
        "vals": [0],  # Starts at 0.
    "Max Objective": {
        "itrs": [],
        "vals": [],  # Does not start at 0.

start_time = time.time()
total_itrs = 700
for itr in range(1, total_itrs + 1):
    sols = opt.ask()
    objs, bcs = simulate(sols, link_lengths)
    opt.tell(objs, bcs)

    # Logging.
    if itr % 50 == 0:
        # We don't need the solutions when calculating metrics.
        archive_data = archive.as_pandas(include_solutions=False)
        metrics["Archive Size"]["itrs"].append(itr)
        metrics["Archive Size"]["vals"].append(len(archive_data))
        metrics["Max Objective"]["itrs"].append(itr)
        metrics["Max Objective"]["vals"].append(archive_data["objective"].max())
        print(f"Finished {itr} itrs after {time.time() - start_time:.2f} s")
Finished 50 itrs after 12.20 s
Finished 100 itrs after 14.20 s
Finished 150 itrs after 16.61 s
Finished 200 itrs after 18.41 s
Finished 250 itrs after 20.06 s
Finished 300 itrs after 21.90 s
Finished 350 itrs after 23.72 s
Finished 400 itrs after 25.72 s
Finished 450 itrs after 28.40 s
Finished 500 itrs after 30.54 s
Finished 550 itrs after 32.19 s
Finished 600 itrs after 33.99 s
Finished 650 itrs after 35.40 s
Finished 700 itrs after 36.75 s

We can now plot the metrics.

for metric in metrics:
    plt.plot(metrics[metric]["itrs"], metrics[metric]["vals"])
    print(f"Final {metric}: {metrics[metric]['vals'][-1]}")
Final Archive Size: 7922
Final Max Objective: -0.01370973786410085
../_images/arm_repertoire_16_1.png ../_images/arm_repertoire_16_2.png

Using the cvt_archive_heatmap from ribs.visualize, we can also plot a heatmap showing all the positions for which we found an arm configuration, as well as the objective value of each configuration (higher is better). As we can see, CMA-ME found a solution for most of the possible positions (the arm can reach anywhere within a circle of radius 12 around its base), but there are still a few gaps.

from ribs.visualize import cvt_archive_heatmap

cvt_archive_heatmap(archive, plot_centroids=False, lw=0.1)
plt.xlabel("Final x")
plt.ylabel("Final y")
plt.title("Final Positions of a 12-DoF Arm");


Finally, we can visualize some of the arm configurations that we found. The following function displays a plot of the arm like the one shown at the beginning of this tutorial.

def visualize(solution, link_lengths, objective, ax):
    """Plots an arm with the given angles and link lengths on ax.
        solution (np.ndarray): A (dim,) array with the joint angles of the arm.
        link_lengths (np.ndarray): The length of each link the arm.
        objective (float): The objective value of this solution.
        ax (plt.Axes): A matplotlib axis on which to display the arm.
    lim = 1.05 * np.sum(link_lengths)  # Add a bit of a border.
    ax.set_xlim(-lim, lim)
    ax.set_ylim(-lim, lim)
    ax.set_title(f"Objective: {objective}")

    # Plot each link / joint.
    pos = np.array([0, 0])  # Starting position of the next joint.
    cum_thetas = np.cumsum(solution)
    for link_length, cum_theta in zip(link_lengths, cum_thetas):
        # Calculate the end of this link.
        next_pos = pos + link_length * np.array(
            [np.cos(cum_theta), np.sin(cum_theta)])
        ax.plot([pos[0], next_pos[0]], [pos[1], next_pos[1]], "-ko", ms=3)
        pos = next_pos

    # Add points for the start and end positions.
    ax.plot(0, 0, "ro", ms=6)
    final_label = f"Final: ({pos[0]:.2f}, {pos[1]:.2f})"
    ax.plot(pos[0], pos[1], "go", ms=6, label=final_label)

We can retrieve several random solutions from the archive with get_random_elite(). As our objective was to find configurations where the joint angles had small standard deviation from each other, it makes sense that all of our arms look “smooth” when visualized, as the joint angles are similar to each other.

fig, ax = plt.subplots(2, 2, figsize=(8, 8))
ax = ax.ravel()
for i in range(len(ax)):
    solution, objective, behavior = archive.get_random_elite()
    visualize(solution, link_lengths, objective, ax[i])

We can also retrieve solutions that reached specific positions with elite_with_behavior. This method will check whether the archive bin with the specified behavior values contains a solution, and if so, it will return that solution. It looks like there is an arm configuration that can reach position (0,0), so let’s see what that looks like.

solution, objective, behavior = archive.elite_with_behavior([0, 0])
_, ax = plt.subplots()
visualize(solution, link_lengths, objective, ax)

As one might expect, the arm looks like a circle, as the joint angles in a circle are all equal. Since elite_with_behavior retrieves a solution in the same bin as the behavior values specified, the actual behavior values of the solution may not exactly match (0,0).


We discussed inverse kinematics, the problem of finding joint configurations that place an arm’s end effector at a specified position. Using CMA-ME with a CVT archive and five improvement emitters, we then created a repertoire of joint configurations that move a 12-DoF planar arm to a wide variety of positions. Finally, we visualized some of the solutions we retrieved. Some ideas for future experiments include:

  • Increasing the number of joints (increase dof under the Quality Diversity Algorithm Setup section)

  • Using different link lengths (set link_lengths in the same section as dof)

  • Using different types of emitters, like the RandomDirectionEmitter and IsoLineEmitter