Feed
Pro
Search

Author

Write
Drafts

Bug0 - The AI-native e2e QA regression testing Passmark - The open-source AI framework for regression testing Hackathons Changelog Brand Hashnode gql skill - let your AI agent publish to your Hashnode blog The Foreword by Hashnode - official blog from the Hashnode team @hashnode on X Hashnode on LinkedIn Support - hello+support@hashnode.com Code of Conduct Terms Privacy Sitemap
Sign in

Search Hashnode

Search posts, tags, users, and pages

FeedDiscussion

Theresa Fruhwuerth

Breaking problems down to first principle - then building back up with caffeine.

Jul 24, 2025

LLM Evaluation: Using DSPy to decompose an LLM Judge

Introduction I have been tinkering with LLMs at work and outside now for quite a while and one of the most pressing issues compared to traditional machine learning is the unsolved problem of how to evaluate them. Evaluating LLM outputs is exponential...

llmshowto.com14 min read

#llm #llm-as-judge #evaluation-metrics #ai #ai-agents #mcp #openai #dspy

Responses(1)

Hi, I liked your approach and it inspired me to do something similar for the evaluation of my chatbot. It's a slightly different approach but uses the same principle of decomposing the evaluation approach.

Thanks for writing this piece!

If you want to see how I used your approach, check it out here: sebastianpdw.medium[.]com/evaluating-ai-chatbots-ai-engineering-in-action-b8cfd0351635

Sebastian

Jul 29, 2025

Theresa Fruhwuerth

Breaking problems down to first principle - then building back up with caffeine.

Oct 24, 2025

Hey, that is great to hear and of course I will check it out, seems like very thorough work and interesting! Also thanks for referencing it :)