Fixing CUDA PTX Error When Running Qwen3-VL with vLLM on H200
Running vision-language models like Qwen3-VL with vLLM on high-end GPUs should be straightforward. Except when it's not.
The Problem
I was setting up Qwen3-VL-8B-Instruct on our H200 cluster (8x H200, 143GB VRAM each) when I hit this error:
vllm ser...
shaunliew.hashnode.dev4 min read