AGAlper GÖÇENininsideaimodels.hashnode.dev·2h ago · 2 min readFree from-scratch deep learning notes: tensors, attention, and a tiny GPTI'm an AI PhD student, and I have started writing a free public notebook on how AI models work under the hood: https://insideaimodels.com/ The goal is to make the mechanics easier to reason about, wit00
STSakshi Tyagiinsakshityagi.hashnode.dev·16h ago · 3 min readThe Memory Wall: Where GPU Memory Actually Goes in LLM TrainingPart 1 of 4 — Scaling LLM Training. As large language models scale toward trillions of parameters and context windows stretch into millions of tokens, distributed-training engineers hit a physical lim00
Llukainpebira.hashnode.dev·17h ago · 5 min readAttention Is All You Need: The Transformer Paper That Built Modern AIAttention Is All You Need: The Transformer Paper That Built Modern AI In 2017, a research paper was published that quietly changed the future of software engineering. The title: “Attention Is All You 00
RRishikantinrishiii2.hashnode.dev·18h ago · 5 min readBonus Chapter: Conquering Edge AI with the NVIDIA Jetson NanoThroughout our 18-part Masterclass, we have operated with the luxury of infinite compute. We trained massive neural networks using high-end GPUs in the cloud, unconstrained by power limits or hardware00
RRishikantinrishiii2.hashnode.dev·1d ago · 7 min readThe Architecture That Changed the World: Transformers, BERT, and the LLM RevolutionIn our previous post, we discovered that Recurrent Neural Networks (RNNs) and LSTMs suffer from a massive flaw: they must process data sequentially, one word at a time. This makes them agonizingly slo00
RRishikantinrishiii2.hashnode.dev·1d ago · 6 min readBreaking the Bottleneck: Seq2Seq Models and the Invention of AttentionIn our previous post, we used LSTMs to solve the Vanishing Gradient problem. By utilizing mathematical memory gates, LSTMs could successfully read a 200-word movie review and predict whether it was po00
RRishikantinrishiii2.hashnode.dev·1d ago · 6 min readMastering Time and Text: RNNs, LSTMs, and the Magic of Word EmbeddingsUp until this point, we have operated in a static world. Whether we were predicting housing prices or running convolutions over an image of a dog, the inputs were fixed in time and space. But what if 00
RRishikantinrishiii2.hashnode.dev·1d ago · 7 min readBeyond Classification: YOLO Object Detection, Face Recognition, and Neural Style TransferIn our previous post, we built a Convolutional Neural Network (CNN) that could look at an image and tell us, "This is a car." But in the real world, telling an autonomous vehicle that a car exists in 00
RRishikantinrishiii2.hashnode.dev·1d ago · 7 min readGiving AI the Gift of Sight: The Math of Convolutional Neural Networks (CNNs)In our previous posts, we built Deep Neural Networks and learned how to stabilize them using Batch Normalization. To process an image, we took a 28x28 grid of pixels and "flattened" it into a massive 00
RRishikantinrishiii2.hashnode.dev·1d ago · 6 min readThe Great Stabilizer: Batch Normalization and the Magic of Transfer LearningBy using He Initialization and the Adam Optimizer, we have built neural networks that are mathematically stable and incredibly fast. But there is still a massive underlying problem with deep networks.00