Taught an AI to Attack Another AI. It Won 44% of the time — With No Backdoor.
What 100 automated battles taught us about why prompt guardrails aren't enough
I built an AI attacker. I gave it one job: break an HR chatbot's rules and get it to approve unauthorized leave. Then I let them fight — 100 times, completely unsupervise...
agent-fight-club.hashnode.dev7 min read