SVshaun vdinpromptfork.hashnode.dev·May 11 · 4 min readPrompt regression testing in CI: a 5-minute setupYour code has tests. Your code has a CI pipeline. A bad change can't merge without going green. Your prompts? Vibes. A teammate edits the system prompt to fix one customer complaint, output quality drops 8% on the other 99% of cases, nobody notices f...00
SVshaun vdinpromptfork.hashnode.dev·May 11 · 3 min readClaude Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 Flash: which wins JSON extraction?We had a question: for structured-output tasks where you just need clean JSON back, which frontier model wins on a cost/quality basis? The answer matters because most production LLM features aren't writing poetry — they're extracting fields from emai...00
SVshaun vdinpromptfork.hashnode.dev·May 11 · 3 min readHow a model upgrade silently broke our extraction prompt (and how we caught it)A friend's product summarizes customer support tickets using a fine-tuned LLM prompt. It worked perfectly on GPT-4o for six months. Then OpenAI deprecated 4o, the team migrated to GPT-4.1, ran a smoke test in the playground, said "looks fine," and sh...00