Evaluation: Measuring our Agents capabilities with Gaia
When working with the non-deterministic nature of LLMs (the same prompt will return different responses each time, due to the probabilistic nature of LLMs) traditional means of software testing are no
ai-notebook.uk5 min read