Taught an AI to Attack Another AI. It Won 44% of the time — With No Backdoor.
Dec 26, 2025 · 7 min read · What 100 automated battles taught us about why prompt guardrails aren't enough I built an AI attacker. I gave it one job: break an HR chatbot's rules and get it to approve unauthorized leave. Then I let them fight — 100 times, completely unsupervise...
Join discussion
