Solving CartPole Without Gradients: Simulated Annealing
In the previous post, we solved CartPole using the Cross-Entropy Method: sample 200 candidate policies, keep the best 40, refit a Gaussian, repeat. It worked beautifully, reaching a perfect score of 5
sesenai.hashnode.dev17 min read