About

I’m a machine learning researcher focusing on (multi-agent) reinforcement learning, LLMs, and AI safety. I work at Anthropic, where we build and align powerful AI systems. My work at Anthropic focuses on using reinforcement learning to get language models to behave in ways that are safe and useful.

Before Anthropic I was a fellow at OpenAI, where I worked on open-endedness and reinforcement learning. I was also in the OpenAI Scholars program, and before that I worked as a machine learning engineer at Coinbase and as an algorithms research scientist at Fitbit. I have a B.S. in mathematics and physics from MIT.

News

  • April 2021: Started at Anthropic, working on building systems to train LLMs with RL
  • ICML 2021: Learning Social learning Emergent Social Learning via Multi-Agent Reinforcement Learning was accepted with an invited talk.
  • November 2020: Learning Social Learning won a best paper award at the 2020 NeurIPS Cooperative AI Workshop!
  • September 2020: Fellowship on the OpenAI open-endedness team, working with Joel Lehman and Key Stanley
  • February 2020: Scholar at OpenAI, studying social multi-agent reinforcement learning with Natasha Jaques

Research


Constitutional AI: Harmlessness from AI Feedback (arXiv)
(middle author)

RLHF is fairly effective as a tool for fine-tuning language models to generate text consistent with the preferences expressed in a large dataset of human preferences. It can be costly to collect this data, and challenging to ensure the dataset precisely reflects the high level desires of human operators. This paper develops "RL from AI feedback" (RLAIF), a technique that uses LLMs to flexibly augment human preference datasets.

Training a Helpful and Harmless Assistant (arXiv)
Yuntao Bai, Andy Jones, Kamal Ndousse, ...

A General Language Assistant as a Laboratory for Alignment (arXiv)
(middle author)

Evolution through Large Models (arXiv)
Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, Kenneth O. Stanley

Emergent Social Learning via Multi-Agent Reinforcement Learning (arXiv)
Kamal Ndousse, Douglas Eck, Sergey Levine, Natasha Jaques

We find that an auxiliary unsupervised prediction task helps model-free reinforcement learning (RL) agents learn social policies. The social policies allow them to learn from experts present in a shared environment, and social learners outperform solitary learners at the same hard-exploration, sparse reward task. The social policies also allow agents to perform well at zero-shot transfer tasks with experts.

Marlgrid (github)

Marlgrid is an open-source gridworld implementation built for multi-agent reinforcement learning (MARL). The design is based on Minigrid.