@yash61016

Yash Patel

@yash61016

Code, Coffee and Creative Blogs ☕️

United KingdomJoined September 2023

About

Nothing here yet.

Available for

Nothing here yet.

Yash Patel's blogs

Yash's Blogblogs.yashpatel.xyz7 posts

Articles Threads Comments

Recently published

YPYash Patelblogs.yashpatel.xyz

Mixtral of Experts: Top-2 Routing Gives 47B Capacity at 13B Active Compute

Mar 22 · 12 min read · At roughly 2.08e11 cumulative FLOPs in my own run, a dense baseline lands at 25.31 validation perplexity. At nearly the same compute budget, a sparse MoE lands at 20.98. The absolute delta is 4.324, a

Join discussion

YPYash Patelblogs.yashpatel.xyz

LLaMA 2: How Three Borrowed Techniques Fit a 70B Model on Two GPUs

Mar 15 · 15 min read · The Memory Problem Serving 10 concurrent users with a 70B-scale model at 4K context, using the vanilla transformer architecture from 2017, requires roughly 240GB of GPU memory: about 140GB for weights

Join discussion

YPYash Patelblogs.yashpatel.xyz

Attention Is All You Need: What the Paper's Heads Are Actually Doing at Each Layer

Mar 8 · 12 min read · Every production LLM you interact with today, LLaMA 3, Mistral, Gemma, Claude, runs on multi-head attention as its core computation. The paper that introduced it, "Attention Is All You Need" (Vaswani

Join discussion

YPYash Patelblogs.yashpatel.xyz

Beyond try/except: Architecting Robust Error Handling in Python Applications

Apr 23, 2025 · 11 min read · As Python developers gain experience, the simple try...except block, while essential, often reveals its limitations in larger, more complex applications. We move from merely catching errors to needing a coherent strategy for managing them – one that ...

Join discussion

YPYash Patelblogs.yashpatel.xyz

Demystifying Reinforcement Learning: A Beginner's Guide to the Math

Mar 4, 2025 · 12 min read · Introduction Imagine teaching a computer to play chess from scratch. How would it learn which moves lead to checkmate and which lead to defeat? How would it understand the long-term consequences of capturing a pawn versus protecting its queen? This i...

Join discussion

Yash Patel

About

Available for

Yash Patel's blogs

Recently published

Mixtral of Experts: Top-2 Routing Gives 47B Capacity at 13B Active Compute

LLaMA 2: How Three Borrowed Techniques Fit a 70B Model on Two GPUs

Attention Is All You Need: What the Paper's Heads Are Actually Doing at Each Layer

Beyond try/except: Architecting Robust Error Handling in Python Applications

Demystifying Reinforcement Learning: A Beginner's Guide to the Math

Search Hashnode

Yash Patel

About

Available for

Yash Patel's blogs

Recently published

Mixtral of Experts: Top-2 Routing Gives 47B Capacity at 13B Active Compute

LLaMA 2: How Three Borrowed Techniques Fit a 70B Model on Two GPUs

Attention Is All You Need: What the Paper's Heads Are Actually Doing at Each Layer

Beyond try/except: Architecting Robust Error Handling in Python Applications

Demystifying Reinforcement Learning: A Beginner's Guide to the Math