Tag feed

#reinforcement-learning

227 posts54 followers

Explore Hashnode

Alternatives

Trending tags this week

MMikuzmikuz.hashnode.dev1d ago · 8 min read

How Reinforcement Learning Trains Large Language Models for Better AI Responses

Reinforcement learning represents a machine learning approach where AI systems improve through experience, executing actions based on environmental conditions and adjusting their algorithms according

0

MEMahdi Eghbaliaijob.hashnode.dev6d ago · 7 min read

The Hidden Bottleneck in Robot Learning Isn’t Algorithms — It’s Environments

Why simulation-first robotics needs shared scenes, shared rules, and a shared arena — not another one-off Isaac Lab repo. Every robotics lab eventually hits the same quiet failure mode. You pick a st

0

RRishikantrishiii2.hashnode.devJun 28 · 7 min read

The Final Frontier: Reinforcement Learning, Bellman Equations, and Deep Q-Networks

In Supervised Learning, we gave our models the exact answers. In Unsupervised Learning, we asked our models to find hidden structures in static datasets. But what if you want to teach a robot to walk,

0

DJDaksh Jaindash10107.hashnode.devJun 21 · 8 min read

Beyond A/B Testing: How AI Handles Ad Fatigue and Revenue Optimization"

If you read any standard tutorial on Multi-Armed Bandits, you will hear the exact same story: A/B testing is inefficient because it wastes 50% of your traffic on a losing variation. Instead, use a Ban

0

BSBerkan Sesensesenai.hashnode.devJun 17 · 21 min read

Trust Region Methods: From REINFORCE to TRPO to PPO

In the REINFORCE post, we built a policy gradient agent from scratch in NumPy and watched it learn CartPole. It worked — eventually. But the reward curve looked like a seismograph. One batch of unluck

0

MMiloblog.miloarchive.comJun 2 · 2 min read

Maze Navigation with Reinforcement Learning

Related Post: Implementing Autonomous Target Navigation in MuJoCo via the Right-Hand Rule This post explores how to solve a maze using Proximal Policy Optimization (PPO) within a custom MuJoCo envir

0

SKStanislav Kirichokswiftdev.hashnode.devMay 26 · 16 min read

How I Taught Creatures to Survive: The WaterWorld Story

WaterWorld is a simulation of an underwater world where a hundred organisms learn to survive through cycles of day and night. The surface provides energy during the day and becomes dangerous at night

0

BSBerkan Sesensesenai.hashnode.devMay 11 · 16 min read

Q-Learning for Games: Teaching an Agent Tic-Tac-Toe Through Self-Play

Tic-tac-toe is a solved game. Any competent adult can force a draw every time. But can an agent figure that out with zero human knowledge? Give two agents a blank board, a few simple rules about wins

0

JKJangwook Kimeffloow.hashnode.devMay 10 · 9 min read

DRA-GRPO: Fixing Diversity Collapse in Reasoning Models

Group Relative Policy Optimization (GRPO) became the dominant approach for training reasoning models after DeepSeek-R1 (arXiv:2501.12948) showed it could reach OpenAI o1-level math performance without a separate value model. But GRPO has a quiet flaw...

0

SRStephane Royflexai.hashnode.devMay 8 · 15 min read

How to Use EasyR1 for Reinforcement Learning on FlexAI

EasyR1 is a reinforcement learning fine-tuning framework that supports GRPO, DAPO, and REINFORCE for reasoning-focused post-training. Use it when SFT starts plateauing on tasks like math, code, or log

0

#reinforcement-learning

Search Hashnode

#reinforcement-learning

Explore Hashnode

Trending tags this week

How Reinforcement Learning Trains Large Language Models for Better AI Responses

The Hidden Bottleneck in Robot Learning Isn’t Algorithms — It’s Environments

The Final Frontier: Reinforcement Learning, Bellman Equations, and Deep Q-Networks

Beyond A/B Testing: How AI Handles Ad Fatigue and Revenue Optimization"

Trust Region Methods: From REINFORCE to TRPO to PPO

Maze Navigation with Reinforcement Learning

How I Taught Creatures to Survive: The WaterWorld Story

Q-Learning for Games: Teaching an Agent Tic-Tac-Toe Through Self-Play

DRA-GRPO: Fixing Diversity Collapse in Reasoning Models

How to Use EasyR1 for Reinforcement Learning on FlexAI