Choosing an Inference Engine on DGX Spark
TL;DR
The DGX Spark has enough unified RAM to load large LLMs, but using dense models makes everything slow. Before I realised the real bottleneck (MoE vs dense, covered in Part 2), I went deep into inference engines. Here’s how they compare on DGX S...
sparktastic.hashnode.dev7 min read