BSBerkan Seseninsesenai.hashnode.dev·6d ago · 21 min readTrust Region Methods: From REINFORCE to TRPO to PPOIn the REINFORCE post, we built a policy gradient agent from scratch in NumPy and watched it learn CartPole. It worked — eventually. But the reward curve looked like a seismograph. One batch of unluck00
BSBerkan Seseninsesenai.hashnode.dev·Jun 12 · 13 min readLDA vs PCA: Supervised Meets Unsupervised Dimensionality ReductionYou have a high-dimensional dataset and you need to squeeze it down to two or three dimensions for visualisation or downstream modelling. The go-to move is PCA, and most of the time it works. But cons00
BSBerkan Seseninsesenai.hashnode.dev·Jun 9 · 14 min readChangepoint Detection: Finding Regime Shifts in Financial DataMarkets do not stay in one regime. The S&P 500 can cruise at 10% annualised volatility for months, then a crisis hits and volatility doubles overnight. Any model trained on the calm period is useless 00
BSBerkan Seseninsesenai.hashnode.dev·May 16 · 17 min readMCMC for Mixture Models: Inferring Earthquake RegimesBetween 1900 and 2006, the number of major earthquakes per year ranged from 6 to 41. In some decades the planet averaged fewer than 15; in others, closer to 30. That is far too much variation for a si00
BSBerkan Seseninsesenai.hashnode.dev·May 11 · 16 min readQ-Learning for Games: Teaching an Agent Tic-Tac-Toe Through Self-PlayTic-tac-toe is a solved game. Any competent adult can force a draw every time. But can an agent figure that out with zero human knowledge? Give two agents a blank board, a few simple rules about wins 00