Agent + MCP + eBPF: 10,869 CUDA Kernel Events, Now Queryable
A vLLM inference server handles hundreds of requests per second. Then one request with n_completions=8 and logprobs=20 arrives, and every other request blocks for 9-11 seconds. GPU utilization monitor
ingero.hashnode.dev6 min read