Tool-calling eval is four problems, not one
I want to start with a trace that still bothers me.
An agent fails to book a flight. The model called search_flights with departure_date="next Friday". The endpoint expected an ISO date, returned a 40
nikhil-p-blogs.hashnode.dev6 min read