IRIndraKumar Reddy Guvvainindrareddy.hashnode.dev·May 11 · 12 min readThree People Who Never Agree Just Said the Same Thing About RoboticsSomething unusual is happening in physical AI right now. Not unusual in the sense of a single dramatic breakthrough. Unusual in the sense that people who almost never agree — a researcher who co-autho00
IRIndraKumar Reddy Guvvainindrareddy.hashnode.dev·May 11 · 19 min readStop Picking Sides: VLAs, JEPA, World Foundational Models, and WAMs Are All Solving Different ProblemsFive terms keep appearing in every robotics paper and every conference talk right now: VLA, JEPA, World Foundation Model, World Action Model, Steerable VLA. They get used interchangeably, or worse, po00
Ttelosintelos-robotics.hashnode.dev·Apr 16 · 9 min readπ0: A General-Purpose Robot Policy via VLM + Flow Matching — Physical Intelligence's First AnswerTL;DR π0 (pi-zero) is a general-purpose robot policy model released by Physical Intelligence in October 2024. The core idea: combine a pre-trained VLM (PaliGemma 3B) with a Flow Matching-based continuous action output — inheriting Internet-scale sema...00
Ttelosintelos-robotics.hashnode.dev·Apr 14 · 7 min readOpenVLA: How a 7B Open-Source Model Beat a 55B Closed-Source OneTL;DR OpenVLA is an open-source Vision-Language-Action model developed jointly by Stanford and UC Berkeley. Built on Prismatic VLM (Llama 2 7B + DINOv2 + SigLIP), trained on 970k robot demonstrations curated from Open X-Embodiment. Zero-shot success ...00
Ttelosintelos-robotics.hashnode.dev·Apr 13 · 6 min readOcto: Open-Source Generalist Robot PolicyTL;DR Octo is an open-source generalist robot policy developed by UC Berkeley RAIL Lab. It's a Transformer model pretrained on 800k trajectories from the Open X-Embodiment dataset, conditioned on natural language commands or goal images, and can adap...00
Ttelosintelos-robotics.hashnode.dev·Apr 12 · 6 min readRT-1: Robotics Transformer for Real-World Control at ScaleTL;DR RT-1 is a 35M-parameter transformer trained on 130,000 real robot demonstrations across 700+ tasks. It takes natural language instructions and camera images as input, and outputs discretized robot actions at 3 Hz in real time. It achieves 97% s...00