Schulman et al. suggest a new policy gradient-based reinforcement learning approach that maintains some of the advantages of trust region proximation optimization (TRPO) while also being much simpler to implement. The general concept involves an alte...
deepboltzer.codes9 min read
No responses yet.