Tag feed

#optimisation

22 posts1 followers

Trending tags this week

Value Iteration vs Q-Learning: Dynamic Programming Meets RL

May 4 · 14 min read · You have a map of the frozen lake. Every crack in the ice, every slippery patch, every hole is marked. You can sit at your desk and plan the perfect route before stepping foot on the ice. That is valu

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Solving CartPole Without Gradients: Simulated Annealing

Apr 23 · 17 min read · In the previous post, we solved CartPole using the Cross-Entropy Method: sample 200 candidate policies, keep the best 40, refit a Gaussian, repeat. It worked beautifully, reaching a perfect score of 5

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

The Cross-Entropy Method: Solving RL Without Gradients

Apr 21 · 14 min read · Reinforcement learning has accumulated layers of complexity over the years: value functions, policy gradients, replay buffers, target networks. The Cross-Entropy Method predates all of it. Rubinstein

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

AI Experts Are Dead. Long Live the AI Experts.

Apr 15 · 16 min read · Last month, my eight-year-old built a Flappy Bird clone from scratch. He can't really type yet. He certainly can't write Python. What he can do is talk to Claude while I whisper in his ear what to say

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Hyperparameter Optimization: Grid vs Random vs Bayesian

Apr 10 · 20 min read · You've trained a Random Forest and it works — 85% accuracy out of the box. But you used the default hyperparameters. What if n_estimators=500 with max_features=0.3 and min_samples_leaf=10 pushes that

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Policy Gradients: REINFORCE from Scratch with NumPy

Apr 8 · 20 min read · In the DQN post, we trained a neural network to estimate Q-values and then picked the best action with argmax. That works when the action space is discrete — push left or push right. But what if you n

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Deep Q-Networks: Experience Replay and Target Networks

Apr 6 · 22 min read · In the Q-learning post, we trained an agent to navigate a 4×4 frozen lake using a simple lookup table — 16 states × 4 actions = 64 numbers. But what happens when the state space isn't a grid? CartPole

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Q-Learning from Scratch: Navigating the Frozen Lake

Apr 4 · 13 min read · Imagine you're standing on a frozen lake. Your goal is on the far side, but there are holes in the ice — fall in and it's game over. Worse, the ice is slippery: when you try to go right, you might sli

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Genetic Algorithms: From Line Fitting to the Travelling Salesman

Apr 3 · 14 min read · Imagine you're planning a road trip through 25 cities. The number of possible routes is 25!/2 — roughly 7.8 × 10²⁴, more than the number of stars in the observable universe. You can't try them all. An

Join discussion

BSBerkan Sesensesenai.hashnode.dev

0

Backpropagation Demystified: Neural Nets from First Principles

Apr 2 · 15 min read · Every modern deep learning framework — PyTorch, TensorFlow, JAX — does one thing brilliantly: it computes gradients for you. Call loss.backward() and millions of parameters update simultaneously. But

Join discussion

#optimisation

Search Hashnode

#optimisation

Trending tags this week

Value Iteration vs Q-Learning: Dynamic Programming Meets RL

Solving CartPole Without Gradients: Simulated Annealing

The Cross-Entropy Method: Solving RL Without Gradients

AI Experts Are Dead. Long Live the AI Experts.

Hyperparameter Optimization: Grid vs Random vs Bayesian

Policy Gradients: REINFORCE from Scratch with NumPy

Deep Q-Networks: Experience Replay and Target Networks

Q-Learning from Scratch: Navigating the Frozen Lake

Genetic Algorithms: From Line Fitting to the Travelling Salesman

Backpropagation Demystified: Neural Nets from First Principles