Rylan Schaeffer

Logo
Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers


C51 RL

Since there’s no guarantee the return distribution will be nice, we need to think of how to allow an agent to flexibly represent distributions. C51’s approach was to use a categorical distribution (hence the C).