shaun vd (@shaunvdd)

SVshaun vdpromptfork.hashnode.devMay 11 · 4 min read

Prompt regression testing in CI: a 5-minute setup

Your code has tests. Your code has a CI pipeline. A bad change can't merge without going green. Your prompts? Vibes. A teammate edits the system prompt to fix one customer complaint, output quality drops 8% on the other 99% of cases, nobody notices f...

0

SVshaun vdpromptfork.hashnode.devMay 11 · 3 min read

Claude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?

We had a question: for structured-output tasks where you just need clean JSON back, which frontier model wins on a cost/quality basis? The answer matters because most production LLM features aren't writing poetry — they're extracting fields from emai...

0

SVshaun vdpromptfork.hashnode.devMay 11 · 3 min read

How a model upgrade silently broke our extraction prompt (and how we caught it)

A friend's product summarizes customer support tickets using a fine-tuned LLM prompt. It worked perfectly on GPT-4o for six months. Then OpenAI deprecated 4o, the team migrated to GPT-4.1, ran a smoke test in the playground, said "looks fine," and sh...

0

shaun vd

About

Available for

shaun vd's blogs

Recently published

Prompt regression testing in CI: a 5-minute setup

Claude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?

How a model upgrade silently broke our extraction prompt (and how we caught it)

shaun vd

About

Available for

shaun vd's blogs

Recently published

Prompt regression testing in CI: a 5-minute setup

Claude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?

How a model upgrade silently broke our extraction prompt (and how we caught it)

shaun vd

About

Available for

shaun vd's blogs

Recently published

Prompt regression testing in CI: a 5-minute setup

Claude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?

How a model upgrade silently broke our extraction prompt (and how we caught it)

Search Hashnode

shaun vd

About

Available for

shaun vd's blogs

Recently published

Prompt regression testing in CI: a 5-minute setup

Claude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?

How a model upgrade silently broke our extraction prompt (and how we caught it)