Blog

2020

Social learning in goal navigation tasks

16 minute read

For the past few months I’ve been studying social learning in multi-agent reinforcement learning as part of the Spring 2020 OpenAI Scholars program. This is last in a series of posts I’ve written during the program, and in this post I’ll discuss some experiments I’ve been conducting to study social learning by independent RL agents. The first post in this series has lots of context about why I’m interested in multi-agent reinforcement learning. Before continuing I wanted to express my tremendous gratitude to OpenAI for organizing the Scholars program, and to my mentor Natasha Jaques for her incredible support and encouragement.

Stale hidden states in PPO-LSTM

9 minute read

I’ve been using Proximal Policy Optimization (PPO, Schulman et al. 2017) to train agents to accomplish gridworld tasks. The neural net architectures I’ve been using include LSTM layers – this gives the agents the capacity to remember details from earlier in an episode when choosing actions later in the episode. This capacity is particularly important in partially observed environments that are ubiquitous in multi-agent reinforcement learning (MARL).

Goal cycle environments

4 minute read

I’ve been thinking about ways to construct challenging gridworld scenarios in which the behavior of expert agents might provide cues that ease learning for novice agents. I’ve focused on tasks in which navigation is central, since the an agent’s movement can always in principle be visible to other agents. Tasks like this often resemble random maze navigation: an agent spawns in an environment with a random layout and receives a reward after navigating to a certain (perhaps randomly placed) goal tile. The episode ends either after the agent reaches the goal, or after a fixed time limit (with the agent respawning each time it reaches the goal).

Hyperparameter hell or: How I learned to stop worrying and love PPO

8 minute read

Multi-agent reinforcement learning (MARL) is pretty tricky. Beyond all the challenges of single-agent RL, interactions between learning agents introduce stochasticity and nonstationarity that can make tasks harder for each agent to master. At a high level, many interesting questions in MARL can be framed in ways that focus on the properties of the interactions between abstract agents and are ostensibly agnostic to the underlying RL algorithms, e.g. “In what circumstances do agents learn to communicate?”. For such questions, the structure of the environment/reward function/etc is usually more important.

Prioritized Experience Replay in DRQN

11 minute read

Q learning is a classic and well-studied reinforcement learning (RL) algorithm. Adding neural network Q-functions led to the milestone Deep Q-Network (DQN) algorithm that surpassed human performance on a suite of Atari games (Mnih et al. 2013). DQN is attractive due to its simplicity, but the DQN-based algorithms that are most successful tend to rely on many tweaks and improvements to achieve stability and good performance.

DQN and DRQN in partially observable gridworlds

11 minute read

RL agents whose policies use only feedforward neural networks have a limited capacity to accomplish tasks in partially observable environments. For such tasks, an agent may need to account for past observations or previous actions to implement a successful strategy.

Multi-agent gridworlds

7 minute read

Gridworlds are popular environments for RL experiments. Agents in gridworlds can move between adjacent tiles in a rectangular grid, and are typically trained to pursue rewards solving simple puzzles in the grid. MiniGrid is a popular and flexible gridworld implementation that has been used in more than 20 publications.

Why I’m excited about MARL

10 minute read

I’m excited to be participating in the 2020 cohort of the OpenAI Scholars program. With the mentorship of Natasha Jaques, I’ll be spending the next few months studying multi-agent reinforcement learning (MARL) and periodically writing blog posts to document my progress. In this first post, I’ll discuss the reasons I’m excited about MARL and my plan for the Scholars program.

2019

MineRL: Recurrent replay

4 minute read

I spent some time recently exploring reinforcement learning in the excellent MineRL minecraft environments. I haven’t played much Minecraft, and I haven’t actually accomplished the personally accomplished the holy grail objective of mining a diamond. The prospect of building a bot that can learn to accomplish a task that I haven’t completed – one that is as human-accessible as this – is incredibly exciting!