Building LLMs From Scratch
DeepSeekV3
You can find the full code here: https://github.com/prashantpandeygit/solvingpapers/tree/main/deepseekv3
8x2 MoE DeepSeekV3 model from scratch in pytorch; this is a decoder only transformer
prashantpandey.hashnode.dev1 min read