Exploring Deceptive Mazes with Novelty Search

Open in Colab

This tutorial is part of the series of pyribs tutorials! See here for the list of all tutorials and the order in which they should be read.

Consider a robot that needs to solve the maze below, introduced in Lehman 2011. To solve this problem, we need to train a robot policy that enables the robot to move from the starting position (red dot) to the goal position (green star).

Maze

At first glance, a reasonable approach seems to be to optimize an objective, i.e., to minimize the Euclidean distance between the robot’s current position and the goal position. However, this maze is deceptive, in that minimizing the Euclidean distance will cause the robot to get stuck behind a wall that is close to the goal, as shown below.

The maze with a robot stuck due to following Euclidean distance.

In fact, in order to solve the maze, the robot actually has to defy the objective! Namely, it has to increase its Euclidean distance from the goal position by going around the walls, as shown below.

The maze with a robot properly solving it.

This insight forms the basis for Novelty Search (NS) in Lehman 2011. In short, NS ignores the objective completely and instead seeks to discover a diverse set of solutions. For example, in the maze below, NS could search for individual robot policies that reach every possible \((x, y)\) position in the maze:

The maze with many robot policies scattered throughout it.

Ultimately, the result is two-fold: First, without considering the objective, NS solves the maze! Second, by seeking diversity, a single run of NS has created policies that go to every position in the maze, rather than just one position. In this tutorial, we will show how to achieve these results by providing an overview of NS and implementing it in pyribs.

Setup

%pip install ribs[visualize] "git+https://github.com/adaptive-intelligent-robotics/Kheperax" jax tqdm imageio
import sys

import jax
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np
from jax.flatten_util import ravel_pytree
from tqdm import tqdm, trange

Maze Environment with Kheperax

We begin by setting up the maze environment in Kheperax. This environment contains a circular two-wheeled robot that navigates in a 2D maze for a single episode. The robot has three LiDAR sensors to estimate distance to walls, and two bumpers to detect contact with walls. Observations consist of the readings from the sensors and bumpers, and actions consist of the signals for the left and right wheels. The animation below (reproduced from the Kheperax repo) shows the robot navigating the maze with its sensors.

Kheperax robot with sensors

Kheperax accelerates simulation of the robot using JAX. As such, our first step is to set up JAX’s random keys. Note that JAX provides acceleration on the GPU, but for simplicity and since Kheperax is fairly small, this tutorial runs on CPU.

seed = 42
random_key = jax.random.PRNGKey(seed)

Next, we set up the Kheperax task. The robot’s policy is represented as an MLP that maps directly from observations to actions. The MLP has two hidden layers of size 8.

from kheperax.tasks.target import TargetKheperaxConfig, TargetKheperaxTask

# Define Task configuration
config_kheperax = TargetKheperaxConfig.get_default()
config_kheperax.episode_length = 250
config_kheperax.mlp_policy_hidden_layer_sizes = (8, 8)

# The default resolution is (1024, 1024), which takes a while to render.
config_kheperax.resolution = (256, 256)


random_key, subkey = jax.random.split(random_key)

# Create Kheperax Task.
(
    env,
    policy_network,
    scoring_fn,
) = TargetKheperaxTask.create_default_task(
    config_kheperax,
    random_key=subkey,
)

scoring_fn = jax.jit(scoring_fn)

The networks in Kheperax are represented as pytrees, while pyribs typically expects 1D parameters. To interface with pyribs, we flatten the pytrees into vectors. Here, we pass an initial batch of observations through the network in order to set up the _array_to_pytree_fn for un-flattening solution parameters, and also to retrieve the dimensionality of the solution space solution_dim.

random_key, subkey = jax.random.split(random_key)
init_batch = jnp.zeros(shape=(1, env.observation_size))
init_parameters = policy_network.init(subkey, init_batch)

flattened_parameters, _array_to_pytree_fn = ravel_pytree(init_parameters)
solution_dim = len(flattened_parameters)
print("solution_dim:", solution_dim)
solution_dim: 138

To enable reaching diverse positions in the maze, the measures are defined as the final position of the robot at the end of each episode. Below we retrieve the bounds of this measure space (measures are referred to as behavior descriptors in Kheperax). For this maze, Kheperax also provides an objective based on the distance to the target position. However, note that Novelty Search ignores the objective.

min_bd, max_bd = env.behavior_descriptor_limits
bounds = [(min_bd[0], max_bd[0]), (min_bd[1], max_bd[1])]
print("bounds:", bounds)
bounds: [(0.0, 1.0), (0.0, 1.0)]

Finally, we define a function that evaluates a batch of candidate policies in the maze.

def evaluate(
    params: np.ndarray, random_key: jax.Array
) -> tuple[np.ndarray, np.ndarray, dict]:
    """Evaluates a batch of robot policies in the maze environment.

    Args:
        params: (batch_size, solution_dim) array of policy parameters.
        random_key: Random key for JAX.

    Returns:
        Tuple of (objectives, measures, info) from evaluating the policies in the
        environment. The objective sums the negative distance to the target at each
        timestep. The measures are the final (x, y) location of the robot.
    """
    params: jax.Array = jnp.asarray(params)
    params_pytree = jax.vmap(_array_to_pytree_fn)(params)

    random_key, subkey = jax.random.split(random_key)
    objectives, measures, info, _ = scoring_fn(params_pytree, subkey)

    return np.asarray(objectives), np.asarray(measures), info

Novelty Search in pyribs

In previous tutorials, we discussed QD algorithms like CMA-ME that build on MAP-Elites and its grid archive. Historically, Novelty Search (NS), introduced in Lehman 2011, was one of the precursors to QD algorithms. Unlike QD algorithms, NS only searches for diversity in measure space, with no consideration for the objective. Below we describe the components of NS and how they are implemented in pyribs.

ProximityArchive

The archive in NS is unstructured, in that the solutions (i.e., policies) are not arranged in a grid like they are in MAP-Elites or CMA-ME. Instead, solutions are added to the archive if they are sufficiently novel. A solution’s novelty \(\rho\) is defined as its average distance in measure space to its \(k\)-nearest neighbors in the current archive:

\[ \rho(x) = \frac{1}{k}\sum_{i=1}^{k}\text{dist}(x, \mu_i) \]

Where \(x\) is the measure value of the solution, and \(\mu_{1..k}\) are the measure values of the solution’s \(k\)-nearest neighbors in measure space. Any distance function dist is permitted, but the most common (and the one we will use today) is Euclidean distance. If the novelty \(\rho\) of a solution exceeds some novelty threshold \(\rho_{min}\), then the solution is added to the archive.

In pyribs, this archive is available as the ProximityArchive. Below we instantiate a ProximityArchive for the maze problem. Note that since the archive is unstructured, we do not need to pass it the bounds of the measure space.

from ribs.archives import ProximityArchive

archive = ProximityArchive(
    solution_dim=solution_dim,
    measure_dim=2,
    k_neighbors=15,
    novelty_threshold=0.01,
)

In addition to the ProximityArchive, it is useful to define a passive result archive that stores solutions found during the search. As suggested in Pugh 2016, this archive is useful for recording metrics like coverage because it provides a fixed tessellation of the measure space. However, it does not play any role in Novelty Search itself. Its role is identical to the result archive in CMA-MAE.

from ribs.archives import GridArchive

result_archive = GridArchive(
    solution_dim=solution_dim,
    dims=(100, 100),
    ranges=bounds,
)

Emitters

The NS archive can be combined with any evolutionary algorithm to search for solutions with diverse measures. Essentially, novelty will guide the algorithm instead of a typical objective. In pyribs, this means that any emitter can be combined with the ProximityArchive to search for diverse solutions. In this tutorial, we select the EvolutionStrategyEmitter that is also used in CMA-ME. Below we instantiate 5 such emitters.

from ribs.emitters import EvolutionStrategyEmitter

# Notes:
# - flattened_parameters was created when we initialized `_array_to_pytree_fn`
# - ranker="nov" indicates that the emitter will rank solutions by their novelty
emitters = [
    EvolutionStrategyEmitter(
        archive,
        x0=np.asarray(flattened_parameters),
        sigma0=0.5,
        ranker="nov",
        selection_rule="mu",
        restart_rule="basic",
        batch_size=30,
    )
    for _ in range(5)
]

Scheduler

Finally, we create a scheduler that combines all the components above.

from ribs.schedulers import Scheduler

scheduler = Scheduler(archive, emitters, result_archive)

Running Novelty Search in the Deceptive Maze

In short, Novelty Search proceeds as follows. First, we call scheduler.ask, in which the emitters generate solutions/policies for the robot. Next, we roll out the solutions in the maze with the evaluate function. In the call to scheduler.tell, if the solutions are sufficiently novel, they are added to the ProximityArchive. The solutions are also added to the GridArchive, which enables keeping track of metrics. Finally, the emitters are updated based on the novelty of each solution, so that they are guided to produce more novel solutions in the future. The execution loop below should take 5-10 min to run.

total_itrs = 2500

for itr in trange(1, total_itrs + 1, file=sys.stdout, desc="Iterations"):
    solutions = scheduler.ask()

    random_key, subkey = jax.random.split(random_key)
    objectives, measures, info = evaluate(solutions, subkey)

    scheduler.tell(objectives, measures)

    # Logging.
    if itr % 100 == 0 or itr == total_itrs:
        tqdm.write(
            f"Iteration {itr:5d} | "
            f"Archive Size: {archive.stats.num_elites} "
            f"Coverage: {result_archive.stats.coverage:6.3%}"
        )
Iteration   100 | Archive Size: 3454 Coverage: 14.690%                                             
Iteration   200 | Archive Size: 5060 Coverage: 20.810%                                             
Iteration   300 | Archive Size: 5808 Coverage: 23.260%                                             
Iteration   400 | Archive Size: 6599 Coverage: 26.290%                                             
Iteration   500 | Archive Size: 7511 Coverage: 30.590%                                             
Iteration   600 | Archive Size: 8328 Coverage: 34.440%                                             
Iteration   700 | Archive Size: 9454 Coverage: 40.630%                                             
Iteration   800 | Archive Size: 11137 Coverage: 49.140%                                            
Iteration   900 | Archive Size: 12794 Coverage: 56.200%                                            
Iteration  1000 | Archive Size: 14277 Coverage: 61.600%                                            
Iteration  1100 | Archive Size: 15661 Coverage: 67.030%                                            
Iteration  1200 | Archive Size: 16937 Coverage: 71.150%                                            
Iteration  1300 | Archive Size: 17935 Coverage: 74.170%                                            
Iteration  1400 | Archive Size: 18754 Coverage: 76.130%                                            
Iteration  1500 | Archive Size: 19423 Coverage: 77.810%                                            
Iteration  1600 | Archive Size: 20046 Coverage: 79.520%                                            
Iteration  1700 | Archive Size: 20722 Coverage: 81.270%                                            
Iteration  1800 | Archive Size: 21553 Coverage: 83.720%                                            
Iteration  1900 | Archive Size: 22124 Coverage: 85.460%                                            
Iteration  2000 | Archive Size: 22713 Coverage: 86.960%                                            
Iteration  2100 | Archive Size: 23002 Coverage: 87.590%                                            
Iteration  2200 | Archive Size: 23243 Coverage: 88.140%                                            
Iteration  2300 | Archive Size: 23435 Coverage: 88.530%                                            
Iteration  2400 | Archive Size: 23564 Coverage: 88.810%                                            
Iteration  2500 | Archive Size: 23633 Coverage: 88.930%                                            
Iterations: 100%|██████████████████████████████████████████████| 2500/2500 [03:45<00:00, 11.09it/s]

Visualizing the Archive

For visualizing Novelty Search, two options are available. First, we can visualize the ProximityArchive that Novelty Search generates. The proximity_archive_plot creates a scatterplot showing all points in the archive in measure space. Notice that the bounds of the plot are not exactly [0, 1]. This is because the proximity_archive_plot adjusts the bounds dynamically based on the points in the plot. Had we plotted the archive in the middle of the run, we may have seen smaller bounds. Also note that there is a circular hole in the top left corner of the archive. This hole is present because the episode ends as soon as the robot touches the circle, so no robot can have measures that are within the circle.

from ribs.visualize import proximity_archive_plot

plt.figure(figsize=(6, 6))
proximity_archive_plot(
    archive,
    ms=2,
    aspect="equal",
    # Since NS does not consider the objective, we make all points have the
    # same color by setting a colormap that has a single color. We also remove
    # the colorbar.
    cmap=["lightgray"],
    cbar=None,
)
../_images/f5341485fa96b4c2d96028bc1f3ab35104c5350a9ee89e4333a828ad7492fe87.png

In addition to the main archive, we can plot the result archive with grid_archive_heatmap. This can be useful if we are comparing to other algorithms that use grid archives, such as MAP-Elites.

from ribs.visualize import grid_archive_heatmap

plt.figure(figsize=(6, 6))
grid_archive_heatmap(
    result_archive,
    cmap=["lightgray"],
    cbar=None,
    aspect="equal",
)
../_images/53671ba5fe683b6af170649f9d064ba021470437c78be2dc8ef0a7c58fa32512.png

Conclusion

In this tutorial, we demonstrated how to set up Novelty Search in pyribs and search for diverse policies in the classic deceptive maze environment. Since its introduction in 2011, the maze environment has become one of the most popular benchmarks in the QD community, with plenty of works running their algorithms in it and extending it to other layouts. Meanwhile, Novelty Search has become a foundational work in the field and has inspired a wide array of algorithms. We hope this tutorial serves as a helpful starting point for using pyribs to explore Novelty Search and its role in QD.

Citation

If you find this tutorial useful, please cite it as:

@article{pyribs_ns_maze,
  title   = {Exploring Deceptive Mazes with Novelty Search},
  author  = {Bryon Tjanaka},
  journal = {pyribs.org},
  year    = {2025},
  url     = {https://docs.pyribs.org/en/stable/tutorials/ns_maze.html}
}