Same model, same GPU, 4× the context: a weekend of inference-stack dogfooding
Apr 29 · 26 min read · I have an RTX 3090 sitting in a Xeon Silver 4314 box at home. I wanted to: Stand up a local inference stack (vLLM nightly with all the bells: speculative decoding, FlashInfer, prefix caching). Use t
Join discussion