Rylan Schaeffer

Logo
Resume
Publications
Learning
Teaching
Kernel Papers


C51 RL

Since there’s no guarantee the return distribution will be nice, we need to think of how to allow an agent to flexibly represent distributions. C51’s approach was to use a categorical distribution (hence the C).