SBStephane Bersierinaiconversations.hashnode.dev·Oct 29, 2025 · 3 min readComplex Log-Mean-Exp Networks1. Core definition A complex log-mean-exp unit (LME unit) is defined as $$y = \frac{1}{\beta}\,\log\Big(\widetilde{\sum_{i=1}^n w_i \exp (\beta \, x_i)} \Big)$$where \(x_i \in \mathbb{C}\) are the complex inputs. \(y\) is the unit’s output. \(w_i ...00
SBStephane Bersierinaiconversations.hashnode.dev·Sep 8, 2025 · 5 min readLess Overfitting via Stochastic ExposureThe following is an edited version of a synthesis written by Grok. Addressing Overfitting in Gradient Descent Training: A Stochastic Exposure Approach In neural network training via gradient descent, a common issue is overconfidence, where models out...00
SBStephane Bersierinaiconversations.hashnode.dev·Aug 31, 2025 · 2 min readWhy Transformers Are PowerfulThe following is an edited version of a synthesis written by ChatGPT. 1. Attention as Dynamic Selectivity The attention mechanism enables transformers to dynamically route information. Rather than collapsing inputs into a fixed summary, each token se...00
SBStephane Bersierinaiconversations.hashnode.dev·Aug 31, 2025 · 3 min readOn the Power of AttentionThe following is a synthesis written by ChatGPT. Attention as Selective Focus The central problem is that a processing system (whether a brain’s conscious workspace or a machine learning model) has finite capacity. It cannot load all available inform...00
SBStephane Bersierinaiconversations.hashnode.dev·Aug 18, 2025 · 4 min readNN Architectures as Generalized AlgorithmsThe following is an edited version of a synthesis written by ChatGPT. 1) Core thesis We can treat neural architectures as a generalization of algorithms. There is a spectrum of algorithmic content: On one end, simple feedforward MLPs carry almost ...00