1) Use a single enterprise GPU (H100‑80GB) with MXFP4 for best $/throughput GPT‑OSS‑120B ships with 4‑bit MXFP4 weights and a Mixture‑of‑Experts (MoE) design with ~5.1B active params per token, enabling efficient inference on a single 80 GB data‑cent...
developer.tenten.co7 min read
No responses yet.