Proximal Policy Optimization
Dec 22, 2022 · 9 min read · Schulman et al. suggest a new policy gradient-based reinforcement learning approach that maintains some of the advantages of trust region proximation optimization (TRPO) while also being much simpler to implement. The general concept involves an alte...
Join discussion




