FeedDiscussion

Anni Huang

LLM Training Operation Specialist @ ByteDance | Ex-Master of IT in Business (AI track) at SMU

Aug 13, 2025

DeepSeek GRPO Explanation (Why we need it? How does it work? What are the findings?)

GRPO: Efficient RLHF via Relative Policy Optimization (Firstly introduced by DeepSeekMath, reference) Why GRPO? Problem with PPO: Slow, memory-intensive, and prone to reward overfitting in large-scale RLHF. GRPO’s Advantage: A compute-efficient ...

huanganni.hashnode.dev2 min read

#rlhf #grpo #deepseek

Responses

No responses yet.

Search Hashnode

DeepSeek GRPO Explanation (Why we need it? How does it work? What are the findings?)

Responses