Sai Chowdary Gullapallymlrad.io·Feb 10, 2025How DeepSeek-R1-Zero Model Learns to Reason Like a HumanIntroduction 👋 DeepSeek AI's recent paper, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[1], has quickly become a hot topic in the AI world, and for good reason! In this paper, DeepSeek introduces their new reaso...#deepseek-r1-zero
William Gaudelierwgaudelier.hashnode.dev·Feb 6, 2025Some personal thoughts while implementing GRPODisclaimer: I recently started implementing GRPO for a small project consisting of balancing a pendulum. These are fresh thoughts, and I still have a lot to learn on the subject. Hopefully later posts will be more insightful and referenced. Reminder ...grpo
Mohit Goyalblog.mohitg.xyz·Feb 5, 2025Titans: A Deep Dive into Next-Generation AI Memory ArchitectureIntroduction Titans represent a significant leap forward in artificial intelligence, particularly in addressing the challenge of long-term memory. Unlike traditional models like Transformers, which struggle with extensive historical contexts, Titans ...Artificial Intelligence
TJ Gokkentjgokken.com·Feb 4, 2025Understanding Reinforcement Learning and Distillation: A Practical C# ExampleA lot of heated discussions have been going around regarding DeepSeek-R1 lately. Instead of getting caught up in various discussions, I choose to focus on the underlying technology. I wrote an article about it last week called DeepSeek-R1: A Primer, ...47 readsReinforcement Learning
shilpa gopiaipaperflashcard.hashnode.dev·Jan 31, 2025DeepSeek-R1 Explained: A Guide to the Open-Source Reasoning AI Training ProcessIntroduction AI reasoning models have made incredible progress, with leading advancements in both proprietary and open-source research. One of the most exciting recent developments is DeepSeek-R1, an open-source model that brings advanced reasoning c...43 readsllm
Ayushman Gargayushmangarg.hashnode.dev·Jan 28, 2025DeepSeek R1Hey guys, if you are active on Twitter or LinkedIn these days, then you must have seen this word a lot, DeepSeek. When I was scrolling Twitter the other day, I saw a post praising it, then another, and then a lot more. So I decided to deep dive into ...51 readsDeepseek
Samruddhi Sangalesamrudhi0909.hashnode.dev·Jan 26, 2025What is DeepSeek-R1?: Simple guide in 5 minutesIn just 3-4 days, DeepSeek took over our Twitter feeds. I decided to dive deep into it, and while doing so, I wrote this blog based on my notes. So here is a simplified version of what is DeepSeek doing under the hood (not really under the hood becau...45 readsLLM's
Mohit Goyalblog.mohitg.xyz·Jan 11, 2025A Roadmap for Scaling Search and Learning in Reinforcement LearningIntroduction Reinforcement learning (RL) has emerged as a powerful paradigm for training agents to make decisions in complex environments, achieving remarkable success in game-playing and robotics. However, the journey towards more general and capabl...1 likeMachine Learning
Parastech-essentials.hashnode.dev·Jan 10, 2025Why Reading Books is Still Vital for Machine Learning Enthusiasts in the Age of AIIn an age where online courses, tutorials, and AI-powered learning tools dominate, books remain essential for mastering machine learning (ML). While digital platforms provide quick and accessible learning opportunities, books offer unmatched depth, s...Machine Learning
Sreedeep cvsreedeep.hashnode.dev·Jan 6, 2025Unlocking the Future: How Reinforcement Learning is Shaping the Next Era of AIArtificial Intelligence (AI) has come a long way in recent decade. From rule-based systems to advanced machine learning models. LLMs capable of tackling complex tasks. Among the various branches of AI, reinforcement learning (RL) stands out as a tran...40 likesAI