SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Balancing Prefill and Decode: Reviewing SARATHI’s Chunked‑Prefill Approach Problem framing and motivation At first glance, the inefficiency described in the work is straightforward yet stubborn: the prefill phase and the decode phase of transformer i...
paperium.hashnode.dev4 min read