Efficient RL Framework: GRPO eliminates costly critic models in RL. (A critic model in reinforcement learning is responsible for evaluating the actions taken by an agent (actor model) by estimating the expected rewards, also known as the value function. It helps guide the agent by providing feedback on how good a particular action or decision is. While effective, critic models are expensive because they often need to be as large and complex as the actor model, doubling the computational cost. Additionally, they require constant updates to align with the actor's learning, adding to memory and GPU/CPU usage. This makes training reinforcement learning systems with critic models resource-intensive, especially for large-scale models like language models.) Floating Point 8 Precision: Reduces memory and compute needs during training and inference. Mixture-of-Experts Design: Activates only a subset of parameters per query, optimizing performance-to-cost. (This is technique Pionered by Mistral) DeepSeek models are trained on custom kernels for efficient GPU-to-GPU communication using NVLink and InfiniBand. Liger-Kernel also did somwhat similar Ref :https://www.linkedin.com/blog/engineering/open-source/liger-kernel-open-source-ecosystem-for-efficient-llm-training?lipi=urn%3Ali%3Apage%3Ad_flagship3_detail_base%3B0zH7nA6VRl20GM0qdPMm3A%3D%3D
