LLaMA 2: How Three Borrowed Techniques Fit a 70B Model on Two GPUs
The Memory Problem
Serving 10 concurrent users with a 70B-scale model at 4K context, using the vanilla transformer architecture from 2017, requires roughly 240GB of GPU memory: about 140GB for weights
blogs.yashpatel.xyz15 min read