Rylan Schaeffer

Logo
Resume
Publications
Learning
Blog
Teaching
Jokes
Kernel Papers


← Back to Reinforcement Learning

C51 RL

Since there’s no guarantee the return distribution will be nice, we need to think of how to allow an agent to flexibly represent distributions. C51’s approach was to use a categorical distribution (hence the C).