Policy Puppetry: The Hidden Threat Inside AI Models
May 16, 2025 · 3 min read · AI tools like ChatGPT, Claude, and Gemini are built with safety features designed to block harmful content. But a new technique called Policy Puppetry, discovered by researchers at HiddenLayer, shows that these guardrails can be bypassed — easily and...
Join discussion