Taming Llama 3.1 on a T4: My Week of Modal Debugging and Service Refactors
Taming Llama 3.1 on a T4: My Week of Modal Debugging and Service Refactors
Hook
I set out to ship a pricing micro‑service that runs a finetuned Llama 3.1 8B model on a single NVIDIA T4. By Thursday the service was either choking on CUDA OOM or taking...
dealhunter.hashnode.dev5 min read