Hands-On vLLM Thinking Token Budget
vLLM is a workhorse to run inference for any LLM under the sun. One of the recent developments in the project is the ability to define thinking_token_budget, basically a request level argument that ca
yankee.dev3 min read