I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)
Feb 8 · 5 min read · Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...
Join discussion





