Discussion

Kashif

Make the computer beep boop.

Feb 8

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

Pretraining an LLM from scratch usually sounds like “big-lab-only” territory. I wanted to test how far a smaller, practical setup can go while keeping the process transparent and reproducible. This post documents an end-to-end run of training a ~360M...

blog.ifkash.dev5 min read

#llama #attention #gpu #gpu-nvidia-amd #gpus #attention-is-all-you-need #flashattention #flash-attention #nvidia #h100 #h100-gpu #nvidia-h100-gpu #huggingface #llm #pre-training

Responses

No responses yet.

Search Hashnode

I Pretrained a 360M LLaMA-Style Language Model from Scratch on 6B FineWeb Tokens (Single H100)

Responses

Recent in Forum