NVIDIA Vera Rubin NVL72: What 10x Lower Inference Cost Per Token Actually Means
9h ago · 8 min read · Originally published at lizecheng.net NVIDIA's Vera Rubin platform achieves 10x lower inference cost per token versus Blackwell through four simultaneous architectural shifts — NVFP4 native compute, HBM4's 2.75x memory bandwidth leap, NVLink 6 doubli...
Join discussion
