Tag feed

#grpo

5 posts0 followers

Trending tags this week

DRA-GRPO: Fixing Diversity Collapse in Reasoning Models

May 10 · 9 min read · Group Relative Policy Optimization (GRPO) became the dominant approach for training reasoning models after DeepSeek-R1 (arXiv:2501.12948) showed it could reach OpenAI o1-level math performance without a separate value model. But GRPO has a quiet flaw...

Join discussion

AHAnni Huanghuanganni.hashnode.dev

0

DeepSeek GRPO Explanation (Why we need it? How does it work? What are the findings?)

Aug 13, 2025 · 2 min read · GRPO: Efficient RLHF via Relative Policy Optimization (Firstly introduced by DeepSeekMath, reference) Why GRPO? Problem with PPO: Slow, memory-intensive, and prone to reward overfitting in large-scale RLHF. GRPO’s Advantage: A compute-efficient ...

Join discussion

AHAnni Huanghuanganni.hashnode.dev

0

Beyond Pre-training: The Power of RLHF in LLM Alignment

Aug 13, 2025 · 2 min read · Pre-training uses massive datasets and computational resources—often thousands of GPUs running for weeks or months—making it a domain dominated by top AI companies. Post-training is much lighter in cost and time (often days instead of months) and foc...

Join discussion

TDTech Divestechdives.hashnode.dev

0

WTF is GRPO? The AI Training Method That’s Changing the Game

Jul 29, 2025 · 4 min read · GRPO: Reinforcement learning (RL) has been around for decades, enabling machines to learn through experience, much like humans do through trial and error. From solving Rubik’s cubes to mastering video games and training robotic arms, RL algorithms ha...

Join discussion

WGWilliam Gaudelierwgaudelier.hashnode.dev

0

Some personal thoughts while implementing GRPO

Feb 6, 2025 · 4 min read · Disclaimer: I recently started implementing GRPO for a small project consisting of balancing a pendulum. These are fresh thoughts, and I still have a lot to learn on the subject. Hopefully later posts will be more insightful and referenced. Reminder ...

Join discussion

#grpo

Search Hashnode

#grpo

Trending tags this week

DRA-GRPO: Fixing Diversity Collapse in Reasoning Models

DeepSeek GRPO Explanation (Why we need it? How does it work? What are the findings?)

Beyond Pre-training: The Power of RLHF in LLM Alignment

WTF is GRPO? The AI Training Method That’s Changing the Game

Some personal thoughts while implementing GRPO