{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Learning a Repertoire of Robot Arm Configurations\n", "\n", "_This tutorial is part of the series of pyribs tutorials! See [here](https://docs.pyribs.org/en/stable/tutorials.html) for the list of all tutorials and the order in which they should be read._\n", "\n", "In robotic manipulation, [inverse kinematics](https://en.wikipedia.org/wiki/Inverse_kinematics) involves figuring out how to configure the joints of an arm such that the end effector is at a certain position. For instance, in order to pick up a cup, a robot must move its gripper to the cup's position, and in order to catch a ball, a robot must move its hand to where it predicts the ball will be.\n", "\n", "In this tutorial, we will show how to use CMA-ME to find a repertoire of joint angles that move a robotic arm to a wide variety of positions. For simplicity, we will use a planar 12-DoF arm, i.e. an arm with 12 joints that only moves in 2D space.\n", "\n", "This tutorial is based on the benchmark introduced in [Vassiliades 2018](https://arxiv.org/abs/1804.03906)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "First, we install pyribs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install ribs[visualize] tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here we import some utilities." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import time\n", "import sys\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from tqdm import trange, tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem Description\n", "\n", "Our arm consists of 12 joints and links of length 1. It looks something like the following, where the red dot is the base of the arm, the green dot is the position of the end effector, and each black dot is an intermediate joint (note that the base also counts as a joint). Since the total length of the arm is 12 units, the arm can only move to positions in a circle of radius 12 around its base.\n", "\n", "![Arm repertoire example](_static/arm_repertoire_example.png)\n", "\n", "Each solution $\\theta$ consists of 12 joint angles for the arm, and each angle $\\theta_i$ is bounded by $[-\\pi, \\pi]$ (i.e. each joint has a full range of motion).\n", "\n", "Since we want to move the arm to a wide variety of positions, our measures are the final $(x,y)$ coordinates of the arm. To calculate these coordinates, we use these [forward kinematics](https://en.wikipedia.org/wiki/Forward_kinematics) equations:\n", "\n", "$$x = l_1 \\cos(\\theta_1) + l_2 \\cos(\\theta_1 + \\theta_2) + ... + l_{12} \\cos(\\theta_1 + \\theta_2 + ... \\theta_{12})$$\n", "\n", "$$y = l_1 \\sin(\\theta_1) + l_2 \\sin(\\theta_1 + \\theta_2) + ... + l_{12} \\sin(\\theta_1 + \\theta_2 + ... \\theta_{12})$$\n", "\n", "\n", "Where $l_i$ is the length of the $i$-th link (in our case, $l_i$ is always 1).\n", "\n", "There are many possible objectives to use. For instance, in a real-world setting, we may choose to minimize the movement of certain joints, as they may be weaker or more prone to wear-and-tear. Here, we choose to maximize the negative standard deviation of the joint angles (i.e. minimize the standard deviation), such that the joint angles are as close to each other as possible. This will make the arm look like a smooth curve in its final position.\n", "\n", "The following simulate function calculates the objectives and measures for a batch of solutions." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def simulate(solutions, link_lengths):\n", " \"\"\"Returns the objective values and measures for a batch of solutions.\n", " \n", " Args:\n", " solutions (np.ndarray): A (batch_size, dim) array where each row\n", " contains the joint angles for the arm. dim will always be 12\n", " in this tutorial.\n", " link_lengths (np.ndarray): A (dim,) array with the lengths of each\n", " arm link (this will always be an array of ones in the tutorial).\n", " Returns:\n", " objs (np.ndarray): (batch_size,) array of objectives.\n", " meas (np.ndarray): (batch_size, 2) array of measures.\n", " \"\"\"\n", " objs = -np.std(solutions, axis=1)\n", "\n", " # theta_1, theta_1 + theta_2, ...\n", " cum_theta = np.cumsum(solutions, axis=1)\n", " # l_1 * cos(theta_1), l_2 * cos(theta_1 + theta_2), ...\n", " x_pos = link_lengths[None] * np.cos(cum_theta)\n", " # l_1 * sin(theta_1), l_2 * sin(theta_1 + theta_2), ...\n", " y_pos = link_lengths[None] * np.sin(cum_theta)\n", "\n", " meas = np.concatenate(\n", " (\n", " np.sum(x_pos, axis=1, keepdims=True),\n", " np.sum(y_pos, axis=1, keepdims=True),\n", " ),\n", " axis=1,\n", " )\n", "\n", " return objs, meas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quality Diversity Algorithm Setup\n", "\n", "We will use CMA-ME, with the following pyribs components, to search for arm configurations:\n", "\n", "- [CVTArchive](https://docs.pyribs.org/en/stable/api/ribs.archives.CVTArchive.html): This archive uses a [Centroidal Voronoi Tesselation (CVT)](https://en.wikipedia.org/wiki/Centroidal_Voronoi_tessellation) to divide the measure space into evenly sized cells. It is typically used for high-dimensional measure spaces where the curse of dimensionality prevents one from using [GridArchive](https://docs.pyribs.org/en/stable/api/ribs.archives.GridArchive.html), but it works perfectly fine for lower dimensions too.\n", "- [EvolutionStartegyEmitter](https://docs.pyribs.org/en/stable/api/ribs.emitters.EvolutionStrategyEmitter.html): This emitter is used to create the improvement emitter, which originated in the work of [Fontaine et al., 2020](https://arxiv.org/abs/1912.02400). It uses CMA-ES to search for solutions that improve the archive.\n", "- [Scheduler](https://docs.pyribs.org/en/stable/api/ribs.schedulers.Scheduler.html): Binds all the components together and controls how the archive and emitters interact.\n", "\n", "First, let's create the archive. This line may take a minute or two to run because it initializes the archive, and initializing CVTArchive involves using [k-means clustering (Lloyd's algorithm)](https://scikit-learn.org/stable/modules/clustering.html#k-means) to generate the CVT." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from ribs.archives import CVTArchive\n", "\n", "dof = 12 # Degrees of freedom for the arm.\n", "link_lengths = np.ones(dof) # 12 links, each with length 1.\n", "max_pos = np.sum(link_lengths)\n", "archive = CVTArchive(\n", " solution_dim=dof,\n", " cells=10000,\n", " # The x and y coordinates are bound by the maximum arm position.\n", " ranges=[(-max_pos, max_pos), (-max_pos, max_pos)],\n", " # The archive will use a k-D tree to search for the cell a solution\n", " # belongs to.\n", " use_kd_tree=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the emitters. We will we use 5 instances of EvolutionStrategyEmitter with two-stage improvement ranking (\"2imp\"), each with a batch size of 30. This means we will evaluate 150 solutions (5 x 30) on each iteration of the algorithm. As described earlier, we also bound each angle to be between $-\\pi$ and $\\pi$." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from ribs.emitters import EvolutionStrategyEmitter\n", "\n", "emitters = [\n", " EvolutionStrategyEmitter(\n", " archive=archive,\n", " x0=np.zeros(dof),\n", " # Initial step size of 0.1 seems reasonable based on the bounds.\n", " sigma0=0.1,\n", " ranker=\"2imp\",\n", " bounds=[(-np.pi, np.pi)] * dof,\n", " batch_size=30,\n", " ) for _ in range(5) # Create 5 separate emitters.\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the scheduler combines everything together." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from ribs.schedulers import Scheduler\n", "\n", "scheduler = Scheduler(archive, emitters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "\n", "With our algorithm set up, we can now search for arm configurations. We also extract metrics throughout our training loop." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iterations: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [00:05<00:00, 135.93it/s]\n" ] } ], "source": [ "metrics = {\n", " \"Archive Size\": {\n", " \"itrs\": [0],\n", " \"vals\": [0], # Starts at 0.\n", " },\n", " \"Max Objective\": {\n", " \"itrs\": [],\n", " \"vals\": [], # Does not start at 0.\n", " },\n", "}\n", "\n", "total_itrs = 700\n", "for itr in trange(1, total_itrs + 1, desc='Iterations', file=sys.stdout):\n", " sols = scheduler.ask()\n", " objs, meas = simulate(sols, link_lengths)\n", " scheduler.tell(objs, meas)\n", "\n", " # Logging.\n", " if itr % 50 == 0:\n", " metrics[\"Archive Size\"][\"itrs\"].append(itr)\n", " metrics[\"Archive Size\"][\"vals\"].append(len(archive))\n", " metrics[\"Max Objective\"][\"itrs\"].append(itr)\n", " metrics[\"Max Objective\"][\"vals\"].append(archive.stats.obj_max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now plot the metrics." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final Archive Size: 7906\n", "Final Max Objective: -0.01241281801121334\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "