The Cross-Entropy Method: Solving RL Without Gradients
Reinforcement learning has accumulated layers of complexity over the years: value functions, policy gradients, replay buffers, target networks. The Cross-Entropy Method predates all of it. Rubinstein
sesenai.hashnode.dev14 min read