BSBerkan Seseninsesenai.hashnode.dev00Value Iteration vs Q-Learning: Dynamic Programming Meets RLMay 4 · 14 min read · You have a map of the frozen lake. Every crack in the ice, every slippery patch, every hole is marked. You can sit at your desk and plan the perfect route before stepping foot on the ice. That is valuJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Solving CartPole Without Gradients: Simulated AnnealingApr 23 · 17 min read · In the previous post, we solved CartPole using the Cross-Entropy Method: sample 200 candidate policies, keep the best 40, refit a Gaussian, repeat. It worked beautifully, reaching a perfect score of 5Join discussion
BSBerkan Seseninsesenai.hashnode.dev00The Cross-Entropy Method: Solving RL Without GradientsApr 21 · 14 min read · Reinforcement learning has accumulated layers of complexity over the years: value functions, policy gradients, replay buffers, target networks. The Cross-Entropy Method predates all of it. Rubinstein Join discussion
BSBerkan Seseninsesenai.hashnode.dev00AI Experts Are Dead. Long Live the AI Experts.Apr 15 · 16 min read · Last month, my eight-year-old built a Flappy Bird clone from scratch. He can't really type yet. He certainly can't write Python. What he can do is talk to Claude while I whisper in his ear what to sayJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Hyperparameter Optimization: Grid vs Random vs BayesianApr 10 · 20 min read · You've trained a Random Forest and it works — 85% accuracy out of the box. But you used the default hyperparameters. What if n_estimators=500 with max_features=0.3 and min_samples_leaf=10 pushes that Join discussion
BSBerkan Seseninsesenai.hashnode.dev00Policy Gradients: REINFORCE from Scratch with NumPyApr 8 · 20 min read · In the DQN post, we trained a neural network to estimate Q-values and then picked the best action with argmax. That works when the action space is discrete — push left or push right. But what if you nJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Deep Q-Networks: Experience Replay and Target NetworksApr 6 · 22 min read · In the Q-learning post, we trained an agent to navigate a 4×4 frozen lake using a simple lookup table — 16 states × 4 actions = 64 numbers. But what happens when the state space isn't a grid? CartPoleJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Q-Learning from Scratch: Navigating the Frozen LakeApr 4 · 13 min read · Imagine you're standing on a frozen lake. Your goal is on the far side, but there are holes in the ice — fall in and it's game over. Worse, the ice is slippery: when you try to go right, you might sliJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Genetic Algorithms: From Line Fitting to the Travelling SalesmanApr 3 · 14 min read · Imagine you're planning a road trip through 25 cities. The number of possible routes is 25!/2 — roughly 7.8 × 10²⁴, more than the number of stars in the observable universe. You can't try them all. AnJoin discussion
BSBerkan Seseninsesenai.hashnode.dev00Backpropagation Demystified: Neural Nets from First PrinciplesApr 2 · 15 min read · Every modern deep learning framework — PyTorch, TensorFlow, JAX — does one thing brilliantly: it computes gradients for you. Call loss.backward() and millions of parameters update simultaneously. But Join discussion