I’m Anni Huang, an AI researcher-in-training currently at ByteDance, specializing in LLM training operations with a coding focus. I bridge the gap between engineering execution and model performance, ensuring the quality, reliability, and timely delivery of large-scale training projects.
I’m open to collaboration in algorithm contests, academic research(RLHF), and AI innovation projects. My long-term goal is to drive impactful AI products—whether through research breakthroughs or vision-driven product management.
Turning a base model into a reasoning model is essentially a post-training + data problem. The model’s architecture can stay the same — what changes is how it’s fine-tuned, what data it sees, and what training objectives you use. Here’s the typical p...

GRPO: Efficient RLHF via Relative Policy Optimization (Firstly introduced by DeepSeekMath, reference) Why GRPO? Problem with PPO: Slow, memory-intensive, and prone to reward overfitting in large-scale RLHF. GRPO’s Advantage: A compute-efficient ...

Pre-training uses massive datasets and computational resources—often thousands of GPUs running for weeks or months—making it a domain dominated by top AI companies. Post-training is much lighter in cost and time (often days instead of months) and foc...

Problem Statement You are given two arrays fruits and baskets of equal length n. Place fruits into baskets following these rules: Each fruit type must be placed in the leftmost available basket with sufficient capacity Each basket holds only one typ...
