C51 RL

Since there’s no guarantee the return distribution will be nice, we need to think of how to allow an agent to flexibly represent distributions. C51’s approach was to use a categorical distribution (hence the C).