AHAfter Hours Researchinafterhoursresearch.com·Nov 1, 2023 · 7 min readLlama2 From Scratch with Pytorch LightningIn our previous blog post, we built the Llama LLM with PyTorch Lightning, with Weights & Biases for experiment tracking and Hydra for configuration management. Now, we turn our attention to Llama 2, the successor to Llama. Let's look at the differenc...00
AHAfter Hours Researchinafterhoursresearch.com·Nov 1, 2023 · 21 min readLlama From Scratch with Pytorch LightningWelcome to this deep dive into building Llama from scratch. This project is inspired by Llama from scratch, but it diverges in several ways. For instance, we make various architectural adjustments, such as modifications to the placement of residuals ...00
AHAfter Hours Researchinafterhoursresearch.com·Oct 10, 2023 · 6 min readBatch Normalization, Layer Normalization and Root Mean Square Layer Normalization: A Comprehensive Guide with Python ImplementationsIntroduction Stabilizing and accelerating the training of neural networks often hinge on the normalization techniques employed. While the theory behind normalization appears straightforward, its practical applications come in various flavours, each w...00
AHAfter Hours Researchinafterhoursresearch.com·Oct 4, 2023 · 5 min readRoPE - Rotary Positional EmbeddingIntroduction The transformer architecture has seen a meteoric rise in its applications across various domains of machine learning. However, the architecture lacks an inherent understanding of the order or sequence of tokens. This necessitates some fo...00