#llmtesting articles

NSNishant Singhnishant-singh.hashnode.devApr 29 · 59 min read

Testing AI Hallucinations in LLM-Backed APIs: A Framework Nobody Has Defined Yet

How do you write a test for a response that is confidently wrong? This is the most urgent open question in software quality right now — and most teams have no answer. Target Audience: AI Engineers ·

0

ACAashish Chapainblog.chapainaashish.com.npApr 3 · 4 min read

Validation & Testing LLM Outputs

The moment we interact with LLMs, we get probabilistic output. They'll return "price": "$45.99" one time and "price": 45.99 the next. Sometimes, they even forget required fields. This might not look l

0

NSNishant Singhnishant-singh.hashnode.devJan 1 · 5 min read

The Complete LLM Testing Taxonomy: 8 Pillars of Quality & Automation

The days of asking "Does it compile?" are over. In the world of Large Language Models (LLMs), the question is now "Does it lie?" If you are building with LLMs, you have likely realized that standard software testing—unit tests, integration tests—does...

0

JVJohn Vesterjohnjvester.hashnode.devDec 10, 2025 · 6 min read

Demystifying Agentic Test Automation for QA Teams

Agentic test automation is a fundamental shift in how we test. Instead of depending on static, hand-written scripts that must be continually updated, agentic systems analyze apps, plan testing strategies, execute tests, and adapt to changing code—lar...

0

GPGeorge Perdikasqualitynestllm.hashnode.devAug 4, 2025 · 2 min read

ChatGPT’s Few-Shot Superpower: Can It Learn From Just a Few Examples?

Have you ever wondered how smart ChatGPT really is? Like, if you give it just a few examples of something, can it figure out the pattern and nail the rest? That’s exactly what we tested with a “few-shot generalization” challenge and the results are p...

0

GPGeorge Perdikasqualitynestllm.hashnode.devJul 27, 2025 · 3 min read

“May I speak to your manager? ChatGPT is tested on tone adaptation in customer support scenarios

Objective The purpose of this test was to evaluate how well ChatGPT adapts its tone and language when used as a customer support chatbot, especially when dealing with customers of varying attitudes, from polite to hostile. Methodology We asked ChatGP...

0

GPGeorge Perdikasqualitynestllm.hashnode.devJul 24, 2025 · 3 min read

Do all CEOs wear suites? Let ChatGPT decide (?)...

When using AI like ChatGPT to aid creative writing or research, we expect outputs that reflect real-world data, especially when realism is explicitly requested. However, sometimes these models reveal subtle biases that are worth examining. In this po...

0

GPGeorge Perdikasqualitynestllm.hashnode.devJul 24, 2025 · 3 min read

Playing Guess the Country with ChatGPT . Spoiler alert!!! : It’s Paris.

When interacting with AI models like ChatGPT, it's important to test their ability to handle indirect, contextual, or colloquial questions. This helps us understand how well the model can interpret human language when phrased creatively or less direc...

0

GPGeorge Perdikasqualitynestllm.hashnode.devJul 23, 2025 · 2 min read

Testing ChatGPT’s multilingual understanding through a chicken recipe!

When we think of testing a large language model like ChatGPT for multilingual skills, most of us wouldn’t immediately think of... a chicken recipe. But that’s exactly the route we took to see if ChatGPT could handle abrupt language switches while kee...

0

GPGeorge Perdikasqualitynestllm.hashnode.devJul 22, 2025 · 3 min read

ChatGPT vs. Math: Can It Teach While Solving?

One of the key uses of LLMs like ChatGPT is as an educational aid. But to be effective, the AI must do more than just provide correct answers, it must also demonstrate step-by-step reasoning, especially when guiding students or learners. In this test...

0

#llmtesting

#llmtesting

Explore Hashnode

Trending tags this week

Testing AI Hallucinations in LLM-Backed APIs: A Framework Nobody Has Defined Yet

Validation & Testing LLM Outputs

The Complete LLM Testing Taxonomy: 8 Pillars of Quality & Automation

Demystifying Agentic Test Automation for QA Teams

ChatGPT’s Few-Shot Superpower: Can It Learn From Just a Few Examples?

“May I speak to your manager? ChatGPT is tested on tone adaptation in customer support scenarios

Do all CEOs wear suites? Let ChatGPT decide (?)...

Playing Guess the Country with ChatGPT . Spoiler alert!!! : It’s Paris.

Testing ChatGPT’s multilingual understanding through a chicken recipe!

ChatGPT vs. Math: Can It Teach While Solving?

#llmtesting

Search Hashnode

#llmtesting

Explore Hashnode

Trending tags this week

Testing AI Hallucinations in LLM-Backed APIs: A Framework Nobody Has Defined Yet

Validation & Testing LLM Outputs

The Complete LLM Testing Taxonomy: 8 Pillars of Quality & Automation

Demystifying Agentic Test Automation for QA Teams

ChatGPT’s Few-Shot Superpower: Can It Learn From Just a Few Examples?

“May I speak to your manager? ChatGPT is tested on tone adaptation in customer support scenarios

Do all CEOs wear suites? Let ChatGPT decide (?)...

Playing Guess the Country with ChatGPT . Spoiler alert!!! : It’s Paris.

Testing ChatGPT’s multilingual understanding through a chicken recipe!

ChatGPT vs. Math: Can It Teach While Solving?