Rylan Schaeffer

Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers

9 September 2022

Paper Summary - "Imitation with Neural Density Models"

by Rylan Schaeffer

The below are my notes on Kim et al. 2021’s Imitation with Neural Density Models.

Summary

Proposes a framework for Imitation Learning by combining:
- density estimation of expert’s occupancy measure, and
- Maximum Occupancy Entropy RL with density as the reward
Proposes an imitation learning algorithm, Neural Density Imitation (NDI)

Background

Imitation Learning (IL) aims to learn optimal behavior by mimicking expert demonstrations
Many IL approaches try to minimize a statistical distance between state-action distributions (i. the “occupancy measures” \(\rho_{\pi_E}\) and \(\rho_{\pi_{\theta}})\)

Method

2-Phase Approach
First, Learn a density estimate \(q_{\phi}\) of the expert’s occupancy measure \(\rho_{\pi_{E}}\)
Second, use Maximum Occupancy Entropy RL (MaxOccEntRL) i.e. use the density estimate \(q_{\phi}\) as a fixed reward for RL and maximizes the occupancy entropy \(H[\rho_{\pi_{\theta}}]\)
The objective is:

\[\mathbb{E}_{\rho_{\pi_{\theta}}}[\log q_{\phi}(s, a)] + H[\rho_{\pi_{\theta}}]\]

MaxOccEntRL applies regularization to the occupancy measure instead of the policy, whereas MaxEntRL only applies to the policy

Challenges

The expert occupancy measure \(\rho_{\pi_E}\) is unknown and must be estimated from demonstrations
The entropy \(H[\rho_{\pi_{\theta}}]\) may not exist in closed form, especially if \(\rho_{\pi_{\theta}}\) is an implicit density

Estimating the Expert Occupancy Measure

Goal: Learn a parameterized density model \(q_{\phi}(s, a)\) of \(\rho_{\pi_{E}}\) from samples
Approach: Try autoregressive models and energy-based models (EBMs)

Results

tags: machine-learning - imitation-learning - reinforcement-learning