@ontosloth

Jongseok Han

@ontosloth

Sunnyvale, USAJoined March 2026

About

Nothing here yet.

Available for

Nothing here yet.

Jongseok Han's blogs

Don't like AIdont-like-ai.hashnode.dev4 posts

Articles Threads Comments

Recently published

JHJongseok Handont-like-ai.hashnode.dev

How Variance Breaks Deep Learning

Mar 18 · 7 min read · If you've ever trained a neural network from scratch, you know the dread of a sudden NaN loss. A smooth training loop suddenly explodes to infinity, and the entire learning process collapses. What tri

Join discussion

JHJongseok Handont-like-ai.hashnode.dev

V3: Fine-Grained Mixture of Experts (MoE)

Mar 13 · 8 min read · Today, we are shifting our focus to the engine room. How does DeepSeek scale up to hundreds of billions of parameters without requiring an unthinkable amount of compute to run? The answer is its highl

Join discussion

JHJongseok Handont-like-ai.hashnode.dev

Rotary Positional Embeddings (RoPE)

Mar 12 · 5 min read · Today, we are tackling a different problem: How does a LLM know where words are? Transformers, by design, are permutation invariant. Without explicit help, they view a beautifully structured sentence

Join discussion

JHJongseok Handont-like-ai.hashnode.dev

V2: Multi-Head Latent Attention (MLA)

Mar 11 · 5 min read · While standard attention mechanisms have served us well, if we want to tackle the major bottlenecks in scaling large language models, we have to look closely at the KV cache. The conceptual explanatio

Join discussion

Jongseok Han

About

Available for

Jongseok Han's blogs

Recently published

How Variance Breaks Deep Learning

V3: Fine-Grained Mixture of Experts (MoE)

Rotary Positional Embeddings (RoPE)

V2: Multi-Head Latent Attention (MLA)

Search Hashnode

Jongseok Han

About

Available for

Jongseok Han's blogs

Recently published

How Variance Breaks Deep Learning

V3: Fine-Grained Mixture of Experts (MoE)

Rotary Positional Embeddings (RoPE)

V2: Multi-Head Latent Attention (MLA)