Tag feed

#deep-learning

2,747 posts1,518 followers

Explore Hashnode

Alternatives

Trending tags this week

AJAman Jaincurious-pm.hashnode.dev3m ago · 12 min read

How Much Context Does a Small GPT Model Really Need?

TL;DR I ran three context-length experiments using the same compact GPT model (Github): Experiment 1 — Fixed training-token budget: When every model processed approximately 200,000 tokens, the 64-tok

0

CBChinmay Bansalchinmaybansal.hashnode.dev11h ago · 10 min read

How Vision-Language Models Actually 'See'

Type "a dog wearing sunglasses at the beach" into an image search bar, and somehow, out of billions of unlabeled photos, the right ones show up. Nobody manually tagged those photos with that exact phr

0

SKSanjeev Kumarsanjeevkumardotin.hashnode.dev1d ago · 6 min read

Understanding Retrieval-Augmented Generation (RAG)

Artificial Intelligence has evolved rapidly over the past few years. Models like ChatGPT, Claude, Gemini, and Llama can write code, answer questions, summarize documents, and even help build applicati

0

STSakshi Tyagisakshityagi.hashnode.dev22h ago · 3 min read

Beyond 1D Data Parallelism: ZeRO, TP, PP, and CP

Part 4 of 4 , Scaling LLM Training. Code for the series: github.com/rocks-saka/Scaling-llm-training The previous three posts got us a long way on a single axis: map the memory, recompute activations,

0

UOUche Ozoemenaincodethismeans.com3d ago · 9 min read

The Sigmoid Function: A Layman's Intuition

The question What does it mean when someone says the sigmoid function converts outputs of a predictive model into a probability value? The task To address that question, let's start with a task that's

0

MFMohammed Fahd Abrahfreecodecamp.org4d ago · 30 min read

AI Paper Review: Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Today, diffusion models power some of the most impressive AI systems ever built. They generate photorealistic images, create videos, synthesize speech, design proteins, and increasingly influence fiel

0

AJAman Jaincurious-pm.hashnode.dev6d ago · 8 min read

LLM architecture (part 2): Inside a Transformer Block

In the previous article, we looked at tokenization and embeddings, which convert a finite vocabulary into continuous vectors that we can work with mathematically. We also explored attention, which hel

0

MDMohsen Davarynejadthegradient.io6d ago · 6 min read

Physics-informed machine learning: when it beats a black box

Every practitioner eventually hits a wall where the answer is not "more data" or "a bigger network." The system you are modelling obeys physical laws you already know, and the black box keeps producin

0

RSRoland Sankarafreecodecamp.org6d ago · 16 min read

CNNs, RNNs, and Transformers Explained: A Mental Model for Key Deep Learning Concepts

Okay, pop quiz: What is a neural network? What is deep learning? Does anything come to mind? I know that feeling – yes, that thing you’re feeling now. It’s either confidence that you know what I’m ask

0

AJAman Jaincurious-pm.hashnode.dev6d ago · 15 min read

LLM architecture (part 1): Tokenization, Embeddings, and Attention

In the first article, I shared why I decided to build a GPT-style model from scratch and what I ultimately built. In this article, I want to open that box and look at the first set of building blocks

0

#deep-learning

Search Hashnode

#deep-learning

Explore Hashnode

Trending tags this week

How Much Context Does a Small GPT Model Really Need?

How Vision-Language Models Actually 'See'

Understanding Retrieval-Augmented Generation (RAG)

Beyond 1D Data Parallelism: ZeRO, TP, PP, and CP

The Sigmoid Function: A Layman's Intuition

AI Paper Review: Deep Unsupervised Learning using Nonequilibrium Thermodynamics

LLM architecture (part 2): Inside a Transformer Block

Physics-informed machine learning: when it beats a black box

CNNs, RNNs, and Transformers Explained: A Mental Model for Key Deep Learning Concepts

LLM architecture (part 1): Tokenization, Embeddings, and Attention