The Benchmark Is the Vulnerability: How AI Agents Are Being Tested to Attack the Real Web
Last spring, a research team gave a large language model agent a list of real, unpatched web application vulnerabilities and a sandboxed environment in which to work. The model did not merely identify the flaws. It exploited them — autonomously, end-...
fetchlogic.hashnode.dev8 min read