Gemma-4 31B + vLLM on RTX 6000 PRO : 1.17k tokens/sec and still asking for more
Model Overview
Gemma-4-31B-it-FP8 is a 30.7B parameter dense Transformer built by Google DeepMind, designed for frontier-level reasoning, coding, multimodal understanding, and agentic workflows. It su
blog.hexgrid.cloud5 min read