An Algorithmic Theory of Metacognition in Minds and Machines

Rylan Schaeffer

NeurIPS 2021 Workshop: Metacognition in the Age of AI Accepted

December 2021

PDF Poster Tweeprint

Abstract

We propose modifying an Actor-Critic so that the Actor and Critic interact several times within each environment step. The Actor constructs a new policy, samples a hypothetical action, queries the Critic, and repeats until satisfied or forced to act. This establishes a connection between Bayesian Optimization and Reinforcement Learning.

Summary

A simple modification to Actor-Critic that enables RL agents to detect and correct their own mistakes through metacognitive interaction.

Three Motivating Phenomena

Phenomenon 1: If you ask people to do a task and evaluate their own performance, about half the population’s self-evaluation is better than their task performance. How can this be possible?

Self-evaluation puzzle

Phenomenon 2: If you ask people to do tasks where errors are prone, you’ll notice that a response-locked error negativity signal appears. The brain is signaling that it made a mistake - but how does the brain know?

Error negativity signal

Phenomenon 3: Via lesion, pharmacology, TMS or other interventions, you can dissociate people’s task performance from their self evaluation. This doesn’t make sense - if I’m better (worse) at knowing whether I err, shouldn’t I be better (worse) at the task?

The Idea

Modify an Actor-Critic so that the Actor and Critic interact several times within each environment step. The Actor constructs a new policy, samples a hypothetical action, queries the Critic, and repeats until satisfied or forced to act.

Metacognitive Actor-Critic

Connection to Bayesian Optimization

This establishes a connection between Bayesian Optimization and Reinforcement Learning. The critic is the surrogate function and the actor is the acquisition function for the black box optimization problem of action selection: argmax_a Q(s_t, a).

Pseudocode

Result: Error Detection and Correction

What do you get if you do this? You get a Reinforcement Learning agent that can (sometimes) detect and correct its own mistakes!

Error detection

The intuition is simple: in the classic Actor-Critic, say the Actor tries to drive off a cliff and the Critic says not to do that. The agent still drives off the cliff! Here, we give the Actor a chance to take into account what the Critic knows.

The agent detects its own erroneous actions even if the Actor’s policy samples those actions with high probability.

Error correction rates

See the full research page for more details.