Discussion on "NVIDIA Vera Rubin NVL72: What 10x Lower Inference Cost Per Token Actually Means"

zecheng li · 2026-03-15T23:07:52.419Z

Originally published at lizecheng.net NVIDIA's Vera Rubin platform achieves 10x lower inference cost per token versus Blackwell through four simultaneous architectural shifts — NVFP4 native compute, HBM4's 2.75x memory bandwidth leap, NVLink 6 doubli...