9 September 2022
Paper Summary - "Imitation with Neural Density Models"
by Rylan Schaeffer
The below are my notes on Kim et al. 2021’s
Imitation with Neural Density Models.
Summary
- Proposes a framework for Imitation Learning by combining:
- density estimation of expert’s occupancy measure, and
- Maximum Occupancy Entropy RL with density as the reward
- Proposes an imitation learning algorithm, Neural Density Imitation (NDI)
Background
- Imitation Learning (IL) aims to learn optimal behavior by mimicking expert demonstrations
- Many IL approaches try to minimize a statistical distance between state-action distributions
(i. the “occupancy measures” \(\rho_{\pi_E}\) and \(\rho_{\pi_{\theta}})\)
Method
- 2-Phase Approach
- First, Learn a density estimate \(q_{\phi}\) of the expert’s occupancy measure \(\rho_{\pi_{E}}\)
- Second, use Maximum Occupancy Entropy RL (MaxOccEntRL) i.e. use the density estimate \(q_{\phi}\)
as a fixed reward for RL and maximizes the occupancy entropy \(H[\rho_{\pi_{\theta}}]\)
- The objective is:
\[\mathbb{E}_{\rho_{\pi_{\theta}}}[\log q_{\phi}(s, a)] + H[\rho_{\pi_{\theta}}]\]
- MaxOccEntRL applies regularization to the occupancy measure instead of the policy, whereas
MaxEntRL only applies to the policy
Challenges
- The expert occupancy measure \(\rho_{\pi_E}\) is unknown and must be estimated from demonstrations
- The entropy \(H[\rho_{\pi_{\theta}}]\) may not exist in closed form, especially if \(\rho_{\pi_{\theta}}\) is an implicit density
Estimating the Expert Occupancy Measure
- Goal: Learn a parameterized density model \(q_{\phi}(s, a)\) of \(\rho_{\pi_{E}}\) from samples
- Approach: Try autoregressive models and energy-based models (EBMs)
Results
tags: machine-learning - imitation-learning - reinforcement-learning