How TriAgent Leaked a Secret Code It Was Never Supposed to Find
TriAgent is a trio of agents that systematically tests LLMs for prompt injection vulnerabilities using customized reward functions in a typical RL loop. Last week I pointed it at its first real target
perfec.hashnode.dev2 min read