Jan 1 · 8 min read · TL;DR: In 2024, we worried about what LLMs said. Now, we worry about what AI Agents do. By testing GPT-4.1-mini vs. GPT-4.1-nano in a "Research & Update" workflow, I discovered a 40% hijack success rate for Indirect Prompt Injection on smaller models...
Join discussionDec 26, 2025 · 7 min read · What 100 automated battles taught us about why prompt guardrails aren't enough I built an AI attacker. I gave it one job: break an HR chatbot's rules and get it to approve unauthorized leave. Then I let them fight — 100 times, completely unsupervise...
Join discussionSep 16, 2025 · 6 min read · Everyone I’ve spoken with about agents asks the same thing: “What about security?” The concern isn’t just technical, it’s governance. If an agent makes a mistake, who’s accountable? If it accesses data, which policies apply? In this article, I share ...
NNirav commented